A novel framework to harmonise satellite data series for climate applications

: Fundamental and thematic climate data records derived from satellite observations provide unique information for climate monitoring and research. Since any satellite only operates over a relatively short period of time, creating a climate data record also requires the combination of space-borne measurements from a series of several (often similar) satellite sensors. Simply combining calibrated measurements from several sensors can, however, produce an inconsistent climate data record. This is particularly true of older, historic sensors whose behaviour in space was often different from their behaviour during pre-launch calibration and more scientiﬁc value can be derived from considering the series of historical and present satellites as a whole. Here, we consider harmonisation as a process that obtains new calibration coefﬁcients for revised sensor calibration models by comparing calibrated measurements over appropriate satellite-to-satellite matchups, such as simultaneous nadir overpasses and which reconciles the calibration of different sensors given their estimated spectral response function differences. We present the concept of a framework that establishes calibration coefﬁcients and their uncertainty and error covariance for an arbitrary number of sensors in a metrologically-rigorous manner. We describe harmonisation and its mathematical formulation as an inverse problem that is extremely challenging when some hundreds of millions of matchups are involved and the errors of fundamental sensor measurements are correlated. We solve the harmonisation problem as marginalised errors in variables regression. The algorithm involves computation of ﬁrst and second-order partial derivatives using Algorithmic Differentiation. Finally, we present re-calibrated radiances from a series of nine Advanced Very High Resolution Radiometer sensors showing that the new time series has smaller matchup differences compared to the unharmonised case while being consistent with uncertainty statistics.


Introduction
Stable and accurate long-term Fundamental Climate Data Records (FCDRs) are indispensable for many aspects of climate research and services.Metrological assessment of historical climate trends and climate variability are feasible, if such FCDRs are harmonised and include quantified uncertainties per datum [1][2][3][4].Such climate data records are also important for assimilation in a reanalysis, as well as for the calculation of statistics that are needed to assess the state of the Earth's climate and to analyse climate extremes.Unfortunately, disruption of operational routines (such as change of equipment or operator personnel, change of receiving station or environment, and change of observing practices or procedures including changes in the pre-launch analysis) may lead to misleading artifacts.Such artefacts may affect the results of climate data analyses considerably, in particular the results of climate trend analyses.In-flight or post-flight sensor calibration must take particular care to eliminate such artefacts as far as is feasible.
FCDRs usually comprise basic physical quantities such as top of atmosphere (TOA) radiance marked with geographic location and time.Note that radiance may be expressed as, e.g., radiance, brightness temperature, or reflectance.It needs also to be acknowledged that disciplines such as, e.g., altimetry, gravimetry or magnetometry measure signals of a different nature.For convenience, all such signals are in the following referred to as "radiance" without loss of generality.FCDRs constitute the basis for deriving Climate Data Records (CDRs) that comprise higher-level geophysical quantities obtained from combining and transforming FCDR data, with or without an eventual projection onto some form of global grid [3,4].
This study focuses on the problem of creating a multi-decadal FCDR of Earth radiance measurements from a series of similar satellite sensors, each of which operates over a relatively short period of time.Creating a multi-decadal FCDR therefore requires the combination of measurements from several such sensors.For example, Figure 1 illustrates the timeline of the more recent NOAA and MetOp satellites, which have all been equipped with a similar suite of sensors, in particular the Advanced Very High Resolution Radiometer (AVHRR).The thermal channels of AVHRR are comparable to the thermal channels of the Advanced Along Track Scanning Radiometer (AATSR) onboard the Envisat satellite, which in its domain was state of the art from 2002 to 2012.In this example, there is a lot of overlap in the lifetimes of individual satellites, and at any instant, since 1992, several satellites have been active, such that there are no gaps over time.Earth radiance data from individually-calibrated sensors may differ more than expected; however, even if they measure the same spectral interval and view the same Earth target area at the same time and under similar viewing angles.Simply combining individually-calibrated radiance measurements from different sensors (even if free from disruptive artefacts) may yield inconsistent and unstable FCDRs that exhibit jumps and changes in trends, in particular when past satellite instruments are involved.Small radiance differences are, however, expected because even similar sensors have different spectral response functions.To produce a harmonised FCDR from a series of similar satellite sensors, the sensor measurement models must be recalibrated simultaneously while respecting mutual spectral response differences [1,2].Some facets of harmonisation resemble standard sensor bias correction and inter-calibration methodologies [5], but harmonisation implements a higher level of metrological rigour and traceability.
Direct Earth observation measurements are telemetry data consisting of digital count numbers related to the incoming radiance (from Earth, space, and internal calibration targets) and information on the sensor state like the temperature of an internal calibration target or the temperature of the instrument and environment.Measurement equations relate these direct observables (usually all on one side of the equation) to the incoming Earth radiance, which constitutes the measurand (usually on the other side of the equation).In addition to direct observables and the measurand itself, the measurement equation includes sensor calibration parameters, which are representative of the measurement principle and of (qualitatively but quantitatively not well known) systematic instrumental effects.Sensor calibration parameters are determined before launch, but the prelaunch calibration is usually not adequate for the full period from launch into space to the end of the mission.In-flight or post-flight recalibration is often achieved by adjusting sensor calibration parameters in such a way that the measured Earth radiance agrees with some (internal or external) reference.A harmonised FCDR then usually refers to a single designated "best" sensor or a series of harmonised "best" sensors, which provides the reference radiance.Alternatively, by including additional constraints, the harmonisation can be based on equally-treated sensors.
An important prerequisite for harmonising fundamental climate data is the assessment of uncertainties associated with the measurement of direct observables and reference radiance [1,2].Ideally, the assessment associates each datum with an uncertainty.In practice, Earth observation data are evaluated on the basis of pairwise matchup datasets, each of which consists of measurements conducted by two satellites viewing the "same" Earth target at the "same" time and under similar viewing geometries.Harmonisation requires that the uncertainty associated with the matching is assessed.Ideally, the assessment associates each matchup of radiance (either provided by the reference or computed from the sensor measurement equation) with an uncertainty.Similarly, harmonisation requires information about the expected difference in radiances measured by the two sensors due, e.g., to spectral response differences and their associated uncertainty, as well as due to any mismatch between the viewing conditions of the two sensors.Finally, though we may assume that the errors in measurements of the reference radiance are independent, the assumption that the errors in measurements of the direct observables are independent is not always justified.For instance, direct observables, like digital count numbers taken for space and internal calibration targets, are often averaged over time before being put into the measurement equation.Though such averaging reduces the apparent noise, characteristic error correlation structures are created, which harmonisation must take into account.
The metric of harmonisation is defined by a cost function, which quantifies the total mutual misfit of calibrated radiance computed for pairwise matchups, under prescription of expected matchup radiance differences.Finding the best calibration is mathematically equivalent to minimising the cost function.Harmonisation considers errors and error correlation structures in all quantities involved with the harmonisation process.Generic optimisation techniques that implement this paradigm are referred to as errors-in-variables (EIV) techniques [6].EIV may be interpreted as a generalisation of orthogonal distance regression (ODR) techniques that takes account of error correlation structures, which standard ODR implementations [7] do not.
The harmonisation of the AVHRR series of sensors considers several hundred million measurements of direct observables and reference radiance.The dimension of error covariance matrices associated with these measurements reaches some 10 million, which is beyond the capacity of standard EIV methods.Hunt et al. [8] and Harris et al. [9] have recently developed an EIV implementation to solve these kind of large and highly covariant optimisation problems.Their EIV implementation considers each measurement as a variable parameter, which is subject to optimisation.The dimension of the search space is equal to the number of measurements, which for AVHRR is several 100 million, plus the number of calibration parameters, which is several 10 s.To reduce the dimension of the search space, we have developed a marginalised EIV (MEIV) technique that is based on and generalises the principles of marginalised linear least squares [10].Like EIV, the MEIV technique takes account of errors and error correlation structures in all quantities involved with the harmonisation process, but only optimises the sensor calibration parameters.MEIV has been applied to the test cases developed by Harris et al. [9] with similar and consistent results.
This paper puts up the mathematical framework to formulate and solve the harmonisation problem and presents results from our initial applications of the MEIV technique to the harmonisation of the AVHRR series of sensors as shown in Figure 1.Section 2 establishes the MEIV cost function, which is minimised to yield a joint maximum posterior probability estimate of all sensor calibration parameters.This estimate is expressed in terms of posterior expected calibration parameter values and the calibration error covariance matrix.Section 3 presents results obtained from applying the MEIV method to solve the problem of harmonising the AVHRR series of sensors (11 µm channel only) with respect to the nominal calibrated AATSR TOA radiance as reference.The results are compared with the nominal AVHRR calibration.Section 4 discusses and concludes this paper.
The nomenclature and notation of quantities like time, radiance, uncertainty and error covariance follows the example of Merchant et al. [3] unless noted otherwise.

Review of the Single-Sensor Calibration Problem
Let x(t) = (x 1 (t), . . ., x n (t)) denote some measurements of direct observables acquired at time t, where each x l is one of the n direct observables that appear in the sensor measurement equation, such as Earth target counts, calibration target counts and measurements of sensor state.Let further α denote the vector of sensor calibration parameters.Then, the generic measurement equation to calculate calibrated radiance L(t) can be written as where the measurement function f may (or may not) explicitly depend on time.Now, let L = (L(t 1 ), . . ., L(t m )) denote m reference measurements of radiance L conducted at chronologically ordered instants t = (t 1 , . . ., t m ), and let the matrix represent the set of chronologically ordered pairwise spatio-temporal matchups between reference measurements of radiance L and measurements of sensor state x.Any metrologically sound calibration scheme takes account of the uncertainty of each measurement included with M and, if present, any correlation of measurement errors.Let the measurements in each column of the matrix M be independent from all measurements in all other columns, which usually is a valid approximation, but let the errors of measurements within the same column be possibly correlated.To express the uncertainty in measurement, the column of reference measurements L(t) is associated with an error covariance matrix S(L).Likewise, each column X l of M is associated with an error covariance matrix S(X l ).Subsuming these matrices under the symbol S(M), the problem of single-sensor calibration is mathematically equivalent to the problem of minimising a cost function that expresses the misfit between reference measurements L(t 1 ), . . ., L(t m ) and corresponding values f (t 1 , x(t 1 ), α), . . ., f (t m , x(t m ), α) of the measurement equation.The foremost prerequisite for solving the optimisation problem is an adequate representation of the error covariance associated with the individual columns of the matchup data matrix M.

Representation of Uncertainty in Matchup Datasets
The previous section has reviewed the mathematical formalism to calibrate a single sensor and has introduced the matchup data matrix M, which is our way to represent a reference-to-sensor matchup dataset.This section generalises this concept, in order to represent arbitrary sensor-to-sensor matchup datasets.
Let x i (t) = (x i 1 (t), . . ., x i n i (t)) denote some measurements acquired at time t, where each x i l is one of the n i direct observables that appear in the measurement function f i (t, x i (t), α i ) of a sensor i with calibration parameters α i .In analogy to Equation (2), let the matrix represent the set of chronologically ordered pairwise spatio-temporal matchups of telemetry data acquired by the sensor i with telemetry data acquired by another sensor j. ) may differ within a tolerance prescribed by given matchup selection criteria.Note that the reference-to-sensor matchup data matrix defined in Equation ( 2) is a specialisation of the general sensor-to-sensor matchup data matrix of Equation ( 4) where the single telemetry datum of sensor i is radiance and the measurement function f i is the identity function.Now again, let the measurements in each column of the matchup data matrix M ij be independent from all measurements in all other columns, but let the errors of measurements within the same column be possibly correlated.Thus, to describe the uncertainty associated with the matchup dataset represented by M ij , we may consider any generic column of M ij and our generic description will apply to all other columns as well.Therefore, let denote any column of M ij .The description of the error covariance matrix S(q) ∈ R m×m is straightforward, but there are several cases to discern.
1. Independent random errors-arise from uncorrelated, i.e., independent, random error effects.Each measurement included with q exhibits an error that is random and independent from the errors of all other measurements.The error covariance matrix S i (q) is diagonal with elements that represent the variance of each error.Let define the diagonal uncertainty matrix; thus, the error covariance matrix S i (q), which is diagonal and of full rank, is given by 2. Common random errors-arise from a fully correlated, i.e., common, random error effect.All measurements in q exhibit an error that is due to the same random effect e. Let u(e) denote the uncertainty associated with this effect and let C e (q) ∈ R m be the absolute sensitivity of q with respect to e.Then, the law of propagation of uncertainty [11] states that the uncertainty of the measurements q which results from each such effect e is and that the error covariance matrix S e (q), which is dense and of rank one, is given by S e (q) = u e (q) (u e (q)) T .
3. Structured random errors-arise when the measurements in q are not original measurements but rather result from an operation on an underlying (and usually larger) set of m original measurements of the same quantity q measured at instants q(t 2 ) . . .
and these m measurements have independent random errors (case 1) associated with them.
A common example operation would be some form of averaging [1,2], but any linear mapping of measurements from q to q is permissible.Let q = W q , where W ∈ R m×m is the matrix that defines the linear operation.Thus, the error covariance matrix S s (q), which is typically sparse and of rank min(m, m ), is given by 4. Structured common random errors-arise when the measurements in q are not original measurements, but result from an operation on an underlying set of original measurements q that have both independent random errors (case 1) and systematic errors (case 2) associated with them.Let q = W q (12) as in the third case.Here, the error covariance matrix is dense and given by S(q) = W (U(q )) 2 + u e (q ) (u e (q )) T W T , (13) which, because u e (q) = W u e (q ), is equivalent to S(q) = W (U(q )) 2 W T + u e (q) (u e (q)) T .( 14) Equation ( 14) demonstrates that the fourth case is a simple combination of the second and the third case, and only the first three cases are genuinely different.Indeed, the error covariance matrix S(q) may always be expressed by any of the basic Equations ( 7), ( 9) and (11) or any combination thereof as, e.g., in Equation (14).Note that knowledge of the sparsity and rank deficiency of the basic matrices is the key to implement storage and computationally efficient matrix representation schemes on a computer.
The error covariance matrix may also be expressed in radiance-equivalent form by propagating the uncertainty associated with each column of sensors i and j, respectively.Let designate the sensitivity (i.e., the Jacobian matrix) of the calibrated radiance L ij of sensor i with respect to the column X ij l , and let correspondingly designate the sensitivity of the calibrated radiance L ji of sensor j with respect to the column X ji l .Then, the law of propagation of uncertainty [11] states that the radiance error covariance matrices S(L ij ) and S(L ji ) are given by and To express the error covariance matrix of the differences of calibrated radiance L ij − L ji , we must take account of the fact that any two sensors i and j do not usually observe the same Earth target at the same time and under the same conditions.Let K ij m denote the differences between the calibrated radiances we expect because of these types of matchup differences.In the absence of better knowledge, we may expect K ij m = 0, but, even in this case, we must associate an error covariance matrix with this expectation.Let the errors in our expectation K ij m be random and independent, thus In general, even similar sensors i and j will have different spectral response functions and we cannot expect that the differences L ij − L ji − K ij m will vanish.Let K ij s denote the differences of calibrated radiance we expect because of mutual differences in the two sensors' spectral response functions.Thus, we may write where we again have assumed that the errors in our expectation K ij s are all independent and random.Finally, we write (22) where are the differences of calibrated radiance we measure, s are the differences of calibrated radiance we expect because of matchup and spectral response differences in total, and In the remainder of this paper, we refer to the double differences d ij (α i , α j ) − K ij as K-residuals.For a perfect harmonisation, the mean K residual will be zero.

Formulation of the Harmonisation Problem
Harmonisation describes the joint calibration of a series of sensors that are all manufactured to make the same measurement, e.g., Earth radiance in a particular spectral band.To formulate the harmonisation problem in a generic way, let the sensors in our series be enumerated by i and j such that i = 1 designates the reference sensor and the matchup data matrix M ij with i ≥ 1 and j > i represents the pairwise spatio-temporal matchups between two sensors i and j as in Equation ( 4).Let all sensor calibration parameters be subsumed under the symbol α and let all reference radiance and sensor telemetry data be subsumed under the symbol X.Furthermore, let α denote any a priori expected values of α and let S(α) denote the a priori associated error covariance matrix.

EIV Formulation
The EIV technique considers all measurements X as variable parameters, in order to take all measurement errors and error correlation structures into account, such as the sensor calibration parameters α.To distinguish (in this subsection only) between variable parameters and invariant measurements thereof, let X denote the measurements and let X denote the variables.In a Bayesian context and in analogy to α and S(α), the measurements X designate the a priori expected values of the parameters X while S( X) embodies the a priori associated error covariance matrix.Hence, the harmonisation problem is equivalent to the problem of minimising the cost function sensor telemetry terms where the calibrated radiance differences d ij (α i , α j ) depend on X ij l and X ji l in the same outward form as defined in Equations ( 23) and (15).The prior calibration term is optional to represent prior knowledge of the calibration parameters, if there is any.Applied to the AVHRR series of sensors, the EIV formulation includes several hundreds of millions measurements X and an equal number of parameters X, which are all subject to optimisation.Harris et al. [9] have recently demonstrated how such large and structured problems can be solved.

Marginalised EIV (MEIV) Formulation
The MEIV technique is concerned with optimising sensor calibration parameters only.Measurement errors and error correlation structures are taken into account by expressing the error covariance of sensor telemetry data in radiance-equivalent form as defined in Equation (22).The marginalised harmonisation problem is then equivalent to the problem of minimising the cost function where the prior calibration term is optional to represent prior knowledge of the calibration parameters, if there is any, or to provide the resolving constraint when the harmonisation is conducted without a designated reference sensor.Note that the sensor telemetry terms of Equation ( 25) have disappeared because the error covariance associated with these terms is represented here by the error covariance matrix S as defined in Equation ( 22).The K-residual term is a sum of quadratic forms, each of which is like the generic expression with a vector v ∈ R m and a symmetric and positive definite matrix S ∈ R m×m .We may rewrite this expression as v T w, (28) where w ∈ R m is the solution of the linear equation The error covariance matrix S of the K-residual term in Equation ( 26) is the sum of positive definite and positive semi definite matrices and thus itself positive definite.Matrices may be large, with millions of rows and the same number of columns.Therefore, we represent all matrices that appear in the equations by matrix-free computer source code that evaluates matrix-vector products without needing an explicit in-memory instance of each matrix element.In particular, we evaluate matrix-vector products involving the Jacobian matrices of Equations ( 16) and ( 17) by computer source code elements that have been generated by algorithmic differentiation (AD) [12] of the sensor measurement functions in forward mode (code generated in forward mode is also known as tangent linear code).Similarly, we evaluate matrix-vector products involving transposed Jacobian matrices by computer source code generated by AD in reverse mode (code generated in reverse mode is also known as adjoint code) by way of generic identities such as S w = (AB + CD) w = (AB) w + (CD) w = A(B w) + C(D w).(30) We decompose the product of any vector w with any composite matrix S, namely the error covariance matrix in Equation (22), into more elementary matrix-vector products.A very costly computational evaluation of matrix-matrix products is avoided.Rather than evaluate Equation (27), which requires a computationally expensive inversion of the matrix S, we solve (29) by means of an efficient preconditioned conjugate gradient algorithm [13] and then evaluate Equation (28).Typically, about ten conjugate gradient iterations are needed to solve Equation ( 29) with sufficient accuracy.
The MEIV cost function is minimised using a limited-memory variant of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [14][15][16].Efficient quasi-Newton minimisation algorithms like BFGS require an accurate evaluation of the cost function gradient.We have generated the computer source code to evaluate the gradient from the source code of the cost function by means of AD in reverse mode [17,18].The adjoint code generated by such source-to-source transformation is most accurate and most efficient in terms of required computational time and memory resources [19].For a single given instance of calibration parameters α, the computational time to evaluate the cost function and its gradient is about five times the computational time required to evaluate the cost function alone, independent of the number of parameters.Memory resources increase by a factor of about two.
The harmonised calibration parameters α which minimise the cost function constitute a maximum posterior probability estimate.The posterior error covariance matrix associated with this estimate is given by the inverse of the Hessian matrix of the cost function at its minimum The Hessian matrix H J (α) is evaluated in vector forward over scalar reverse mode of AD [20], i.e., the gradient source code generated before in scalar reverse mode of AD is differentiated in vector forward mode, which yields the source code to evaluate the second order partial derivatives.The vector forward mode of AD propagates multiple perturbations simultaneously, which speeds up the evaluation by exploiting the temporal and spatial locality of data in computer memory [21].In contrast to the EIV formulation of the harmonisation problem, the number of parameters in the MEIV formulation is small, making it feasible to invert the Hessian matrix directly.

Measurement Equation
We have used the same measurement equation for all AVHRR sensors.Let CS and CICT denote the sensor digital count numbers taken for the space view and the internal calibration target (ICT) that have resulted from a moving average of the respective direct measurements over several scan lines.Furthermore, let C E denote the sensor digital count number directly taken for an Earth target, let L ICT denote the radiance calculated (by means of Planck's law and convolution with the sensor spectral response) from direct measurements of the ICT's temperature, and let T denote the temperature of the instrument.Then, for each sensor i with telemetry data where is a constant that denotes the nominal emissivity of the internal calibration target.Note that we have neglected the radiance from space (negligible in comparison with radiance from the ICT) and that the instrument-temperature dependent term α i 4 T i implicitly depends on the time of observation as the instrument temperature evolves over time with changes in the thermal environment.Table 1 summarises the nomenclature of AVHRR telemetry data and associated error correlation structures that we have assessed.Though in principle the measurements of ICT radiance exhibit all types of error correlation represented by Equations ( 7), ( 9) and ( 11), we have taken account of independent random errors only.It is important to note that Equation ( 32) is invariant under any common random error in the measurements of instrument temperature T i because any correction to the instrument temperature term is already represented by the calibration parameter α i 1 .Therefore, Equation (32) must not include additional calibration parameters to take account of (and estimate) these common random errors, and harmonisation should not consider them at all.

Satellite Data
We have considered the series of nine AVHRR sensors on-board the NOAA 11-19 and MetOp-A satellites to be harmonised with respect to the AATSR on-board Envisat as designated reference.The time lines of these satellite missions are illustrated in Figure 1.We have considered the period from 1991 to 2016 throughout which there were always two or more sensors in operation.AVHRR measures Earth TOA radiance in several wave bands.Here, we have examined the 11 µm channel only for the AVHRR Global Area Coverage (GAC) data.The spectral response function for this instrument ranges from 10.3 µm to 11.3 µm and the resolution is 5 km × 3 km around the sub-satellite point on ground.For examples of the AVHRR characteristics, see the NOAA KLM Guide [22].
Fundamental AVHRR telemetry data have been taken from the NOAA Comprehensive Large Array-data Stewardship System's AVHRR Level 1B data and the AATSR reference TOA radiance data have been taken from the ESA AATSR Level 1 data.The reference uncertainty at the pixel level from the AATSR data has been taken to be of order 0.05 K [23].The associated spatial resolutions have been assumed to be 1 km × 1 km at the sub-satellite point, and approximately 5 km × 3 km for the AVHRR GAC data.Note that, because one of the matchup criteria is to enforce a level of spatial uniformity in the observed field, any error introduced by possible mismatches between the AATSR and AVHRR footprints will also be included in the matchup uncertainty, which is primarily based on the local spatial variability.
Sensor-to-sensor matchups have been found automatically [24] under prescription of certain matchup selection criteria which are based on the criteria from the GSICS project [25].First, the centre of each sensor's field of view has to be concurrent so the two points are spatially consistent within the associated pixel footprint.Then, to minimise time variable differences between the matchups, we have used a time window of 300 s.We have also ensured parity in the viewing path length between the sensors and have used the difference in the path lengths of less than 1% as the selection criterion.Finally, we have enforced field uniformity to minimise errors due to geolocation errors and/or spatial inhomogeneity issues and have in this case used a threshold of 0.5 K for the spatial standard deviation in the AATSR pixels found in and around the AVHRR footprint.
A single matchup dataset has been created for each relevant pair of sensors.Table 2 lists the number of matchups selected for each pair, adding up to about 40 million matchups in total.Error covariance information has been added to each matchup dataset in a way that evaluating Equations ( 7), ( 9) and ( 11) is straightforward.Weight matrices W have been represented in compressed sparse row format [26].
Table 2. Number of sensor-to-sensor matchups selected for different pairs of satellites.Mnemonic names (see Figure 1) in the leading column refer to the first sensor i of a matchup dataset M ij while mnemonic names in the heading row refer to the second sensor j.

Number of Matchups (Million
For each pair of sensors, the expected radiance differences K ij and associated uncertainties u(K ij m ) and u(K ij s ) have been derived using a database of over a million Infrared Atmospheric Sounding Interferometer (IASI) spectra to represent variations in the observed TOA spectral radiance.For each instrument, the appropriate spectral response function has been used to integrate the IASI spectra to derive estimates of the respective infrared (IR) channel radiances.The expected difference in radiance for the 11 µm channel between any two sensor pairs can then be directly estimated for that particular top of atmosphere spectrum.These differences are incorporated into a lookup table which is a function combining all three IR channels for night time matches (where solar scattering is not an issue) and using the two long wave IR channels during the day.Note that doing it this way ensures estimates of the channel-to-channel difference under both clear sky as well as cloudy conditions.The lookup table elements themselves contain the average 11 µm channel difference together with an estimate of the standard deviation of the mean using the distribution of all difference estimates contained within that particular lookup table element.Those lookup table elements where there are insufficient difference estimates to provide a standard deviation are discarded.

Results
This section presents results on the harmonisation of a series of nine AVHRR sensors on-board the MetOp-A and NOAA 11-19 satellites with respect to the AATSR on-board Envisat as the designated reference.The MEIV technique has been applied to harmonise AVHRR 11 µm TOA radiance as computed from sensor telemetry data by means of (32).The cost function includes about 40 million matchups, which requires less than 30 GB of memory on a single compute node to run the optimisation and to calculate the calibration parameter error covariance matrix.Exploiting eight concurrent threads to compute cost function within each iteration, the optimisation takes less than 40 h on a single workstation with Intel (Santa Clara, CA, United States) Core i7-6700K CPU at a clock rate of 4.0 GHz.Calculating the parameter error covariance matrix takes about one hour.Results comprise harmonised calibration parameters, diagnostic diagrams, and a tentative stability analysis.

Harmonised Calibration Parameters
The harmonisation procedure optimised 36 calibration parameters α, four calibration parameters for each AVHHR instrument considered, and calculated the calibration parameter error covariance matrix.Table 3 lists the harmonised calibration parameter values α.Uncertainties have been obtained from the square root of the diagonal elements of the error covariance matrix.Note that the constraints on the calibration parameter α 4 of the AVHRRs on board MetOp-A and NOAA-19 are relatively weak because the temperature of these instruments is very stable, i.e from 285.9 K to 286.4 K and from 286.9 K to 288.3 K, respectively.The harmonised values of the calibration parameters α 1 and α 4 (and their errors) are therefore strongly correlated in the cases.Figure 2 illustrates the posterior error correlation matrix of the harmonised calibration parameters, which has been derived from the error covariance matrix S(α).The figure visualises intra-and inter-sensor error correlations.While the former emanate from the measurement equation itself, the latter originate from matchups of an AVHRR sensor with any other AVHRR sensor.The inter-sensor error correlation structure shown in Figure 2 is a direct implication of the structure and number of pairwise reference-to-sensor and sensor-to-sensor matchups listed in Table 2. Due to the sparse temporal overlapping between the later and the earlier AVHRRs (i.e., on board NOAA-11 to NOAA-14) the respective inter-sensor error correlations are relatively weak.Because the errors in the harmonised calibration parameters are strongly correlated, the full error covariance matrix (and not the mere diagonal variance elements only) must be considered in order to propagate calibration uncertainty to harmonised FCDR radiance.

Diagnostic Diagrams
Figure 3 illustrates how the optimisation procedure (which initially started from α = 0) reduced the value of the cost function and after 1000 iterations converged at the harmonised calibration α.If the measurement equation were a perfect representation of the instrument and errors in all matchup data were distributed normally and characterised perfectly well, the statistically expected value of the cost function at its minimum would be half the number of matchups, i.e., about 20 million.Figure 3 reveals that the actual minimum is less, i.e., about 13 million, which indicates that matchup data uncertainties may have been overestimated.
Figure 4 illustrates the mean and the standard deviation of the harmonised K-residual distribution for each sensor pair.If there were an issue with a certain sensor, this type of diagram would reveal it.Note that the magnitude of K-residuals (here and in other figures expressed in mW m −2 sr −1 cm) is approximately the same as the magnitude of corresponding brightness temperature difference (expressed in K).That is to say most mean K-residuals differ by less than 0.02 K and all differ by less than 0.06 K from zero, the value expected if the calibration of each instrument were without fault.Figure 5 depicts the probability distribution of harmonised K-residuals for all sensor pairs.This type of figure would reveal if the matchup data were affected by outliers or largely non-normal error statistics.The top panel of Figure 5 demonstrates that the distribution of K-residuals is a reasonable combination of a normal and a Laplace distribution and does not exhibit a significant fraction of statistical outliers.The mean of the K-residual distribution of −0.004 mW m −2 sr −1 cm is virtually zero, which would be expected if the harmonisation were perfect.The standard deviation of the K-residual distribution is 0.238 mW m −2 sr −1 cm, i.e., when expressed in brightness temperature, the dispersion of K-residuals is less than 0.3 K.The bottom panel of Figure 5 shows that the distribution of uncertainty-normalised K-residuals is more compact than a standard normal distribution, which they would be expected to follow if the overall uncertainty analysis were flawless.The more compact distribution of K-residuals seen may be explained by an overestimation of the uncertainties.

Stability of the Calibrated Radiance Data Record
The distribution of K-residuals over time (see Figure 6) may be used to infer the stability of the harmonised radiance data record.The top panel of Figure 7 illustrates the K-residuals for a subsample of matchups between all possible sensor pairs, with different sensor pairs marked by different colours.The subsample has been selected randomly from the distribution illustrated in Figure 6.The bottom panel of Figure 7 shows the K-residuals for the same matchups, but calculated from the nominal AVHRR calibration instead of the harmonised calibration.The mean K-residuals obtained from the nominal calibration exhibit mutual differences of 1-2 radiance units.Depending on the sensor pair considered, trends of mean K-residuals over time are sometimes positive and sometimes negative.The bottom panel of Figure 7 reveals the deficiencies of the nominal radiance calibration, which lead to highly unstable radiance data records that do meet climate data requirements.In contrast, the top panel of Figure 7 demonstrates that mutual differences of the mean harmonised K-residuals and their temporal trends are small, indicating a much better stability.Figure 8 illustrates the trend of the mean harmonised K-residual from all sensor pairs over time, which is 0.0145 mW m −2 sr −1 cm decade −1 .Remembering that the magnitude of K-residuals is approximately the same as the magnitude of the corresponding brightness temperature difference, this trend roughly translates into 0.015 K decade −1 which is half the value tolerated by the requirement of the Global Climate Observing System (GCOS) on the stability of the Sea Surface Temperature essential climate variable [27].Please note that this tentative stability analysis is based on matchup pairs, which are for the most part located in Arctic (Greenland) and Antarctic regions and do not constitute a globally-representative sample.The radiance samples from the selected matchups do, however, cover almost the complete range of observed radiances.

Discussion
We have presented a metrologically rigorous marginalised errors-in-variables (MEIV) approach to achieve a harmonised calibration of sensors onboard a series of Earth observation satellites.We have applied the method to harmonise TOA radiance from nine AVHRR sensors on board of NOAA and MetOp satellites with the AATSR reference.The harmonised calibration constitutes a significant improvement when compared to the nominal calibration in terms of stability over time, being roughly in the region of GCOS requirements.Matrix-free computations within the MEIV method allow for matchup datasets including several hundred million measurements, which enables calibration on the basis of even the noisiest telemetry.Most accurate and efficient computations involving first and second order partial derivatives have been enabled by means of advanced AD techniques [17][18][19][20][21], namely Transformation of Algorithms in Fortran [28].The use of AD techniques enables quick trials of measurement equations with different or additional calibration parameters without the need to put any effort on programming partial derivative codes manually.
The mathematical framework presented in this paper is fully developed, but the harmonisation of the AVHRR series of sensors is ongoing.The Fidelity and Uncertainty in Climate data records from Earth observation (FIDUCEO) project [29] is refining the measurement equation, based on ongoing analyses of instrument characteristics and the results of the initial harmonised calibration presented in this study.Application of the MEIV method to harmonise the High-resolution Infrared Radiation Sounder (HIRS) series of sensors and the microwave sounders onboard NOAA and MetOp satellites is an ongoing activity within the FIDUCEO project too.Eventually, the harmonised calibration parameters will be used to produce new harmonised FCDRs, which include metrologically sound associated uncertainties per datum.
The MEIV framework is very generic.It may be configured to use a single reference sensor as in this study, but may readily use multiple reference sensors, which have already been harmonised too.Configuring calibration prior terms for designated reference sensors even allows for a harmonisation where the nominal calibration of the reference is updated with new information.Though the framework considers instrument channels individually, an obvious future extension of the framework may consider the calibration of multiple instrument channels in one go, e.g., to quantify inter-channel error correlation.On the other extreme, the framework may be configured to improve existing bias correction schemes (or sensor-equivalent calibration activities like GSICS [30]) which work on the basis of nominally calibrated radiance.Here, the improvement will be a (formally) correct consideration and propagation of radiance and bias correction uncertainty (which will not be an as complete and metrologically rigorous treatment as harmonisation, however).
With SI traceable satellites planned and launched in the future, harmonisation will have the potential to establish SI traceability for even heritage instruments, with the ultimate objective to produce SI traceable FCDRs extending over decades, ideally back to the beginning of Earth observation from Space.

Figure 1 .
Figure1.Timeline of NOAA and MetOp satellites, which have all been equipped with similar sensors, like, e.g., the AVHRR.The AATSR sensor onboard the Envisat satellite was similar to the AVHRR.

Figure 2 .
Figure 2. Error correlation matrix of harmonised calibration parameters.Thin horizontal and vertical lines separate the symmetric matrix into blocks of four by four pixels.Each block is associated with different sensors as indicated by mnemonic labels along the horizontal and vertical axis.Pixels inside diagonal blocks represent intra-sensor error correlations, whereas pixels inside off-diagonal blocks represent inter-sensor error correlations.Each off-diagonal pixel indicates how the errors in two different calibration parameters are correlated.

Figure 4 .
Figure 4. Mean and standard deviation of the harmonised K-residual distribution for each matchup dataset.

5 KFigure 5 .
Figure 5. (a) top panel: distribution of harmonised K-residuals for all sensor pairs.The solid and dashed lines mark normal and Laplace distributions, which have the same mean and the same dispersion as the distribution of K-residuals; (b) bottom panel: as above but showing the distribution of uncertainty-normalised K-residuals.The dotted curve marks the standard normal distribution.

Figure 6 Figure 6 .
Figure6depicts the distribution of K-residuals over time.This type of diagram reveals when the matchup data are affected by temporary outliers and other particular error statistics, which do not appear in Figure5.For instance, Figure6reveals some seasonal variations and demonstrates that the increase of dispersion of K-residuals in the years 2008 and 2009 arises from temporary issues in matches between the NOAA-16 and -15 satellites.

Figure 7 .
Figure 7. Random sample of 10,000 matchups from all sensor pairs.(a) top panel: K-residuals calculated from the harmonised calibration; (b) bottom panel: the same samples, but calculated from the nominal AVHRR calibration.

Figure 8 .
Figure 8. Trend of the mean harmonised K-residual from all sensor pairs over time.Dashed lines roughly indicate the requirement of GCOS on the decadal stability of the Sea Surface Temperature essential climate variable, which is 0.03 K decade −1 [27].

Table 1 .
Nomenclature and error correlation structure of AVHRR telemetry data that has been taken into account.

Table 3 .
Values of harmonised calibration parameters.
Minimisation of the cost function.