Gaussian processes for blazar variability studies

This article shortly introduces Gaussian processes (GP) as a new approach for modelling time series in the field of blazar physics. In the second part of the paper, recent results from an application of GP modelling to the multi-wavelength light curves of the blazar PKS 1502+106 are discussed.


Introduction
Detailed investigations of the emission from blazars have offered, and continue to offer, important insights into this extreme population of active galactic nuclei (AGN). Blazar variability is an indispensable tool because it offers information on scales even smaller than those probed with imaging techniques. Furthermore, given the predictions of various blazar variability models [e.g., [1][2][3][4] , we are able to assess their validity taking advantage of the fingerprints of each model in the time and frequency domains, namely on light curves and broadband spectra.
Numerous monitoring efforts have been in place since the early days of the discovery of AGN and blazars, however most, if not all, lacked the high time resolution required for detailed studies. The tables have turned with the launch of the Fermi γ-ray space telescope and complementary efforts for multi-frequency blazar monitoring. One such example in the radio domain is the F-GAMMA (Fermi-GST AGN Multi-frequency Monitoring Alliance) program [5,6]. The F-GAMMA operated between 2007 and 2015 and served as a prime provider of single-dish radio monitoring complementary to Fermi [7].
The goal of variability studies is the extraction of parameters such as the variability amplitudes, time scales, and others, from observations. The problem of modelling the observed variability in the radio and other bands, has been addressed using numerous approaches. For modelling light curves and the flares seen in them, Gaussian, exponential, and various other fitting methods have been used (e.g. [8][9][10]; this is by no means a complete list). Depending on each specific case, they work better or worse but the common characteristic among all the above is that a functional form of the underlying process is always assumed. In other words, a parametric approach is being employed. However, this might not be appropriate and we might have to admit that at times we have weak prior knowledge as to what is the mathematical form of such a process.
An elegant way to overcome this state of limited knowledge is by means of Bayesian inference. Specifically, this paper is concerned with the application of Gaussian processes in the field of blazar variability studies and modelling of time series. In the next section, a brief introduction to Gaussian processes is given. Then, an application to the light curves of the blazar PKS 1502+106 is discussed, aiming to quantify an outburst seen from 2008 to 2010 and extract light curve parameters like the flare amplitude and putative delays between its peak at different bands. First results along with an extended discussion can be found in [11].

Non-parametric models and Gaussian processes
The problem at hand is as follows: we want to extract (or learn, in the context of Bayesian machine learning) the properties of a system from a given set of observations. This can be a function f () that describes the system well enough and predicts its future evolution. There are essentially two ways of achieving this goal. The first involves the selection of a class of functions, e.g. linear, quadratic, Gaussian etc. with (free) parameters, the values of which are obtained through minimizing the residuals between f () and our observations. Such a selection can result from strong prior knowledge regarding the system. One can increase the flexibility of the model by increasing the number of parameters at the expense, though, of potential overfitting; i.e. describing random small scale variations of the data instead of the general trend.
Still, cases in which we have little or no prior knowledge regarding the underlying process, and consequently the appropriate model to use, exist in abundance. Non-parametric models offer an interesting alternative to the first approach. By comparison, non-parametric models "translate" our prior knowledge about a system, no matter how generic this might be, e.g. smooth and/or continuous changes, into probability distributions of functions that are more likely to describe the system. The task of modelling then reduces to fine-tuning these prior distributions so that they include functions that best describe our observations; i.e. the posterior distributions. Since no explicit use of free parameters is made, these models are referred to as non-parametric. Non-parametric models, however, use so-called hyperparameters which instead of determining the properties of functions, as in the first approach, govern the properties of the distribution. These still need to be inferred and Bayesian model selection provides a robust framework for doing so.
Gaussian processes (GPs) are the generalization of uni-or multivariate Gaussian distributions of variables into the space of functions. GPs offer a very flexible framework for modelling unknown functions [12][13][14]. They are widely used in machine learning applications and a new field of applications in astronomy has recently opened [e.g. [15][16][17][18]. The interested reader is referred to [19] for additional details.

Gaussian processes for regression
Given a set of observations, e.g. a typical light curve, with the form D = (x i , y i ), where y i represent recorded flux densities at times x i , we would like to find all functions drawn from the prior distribution which pass through all observations. The latter distribution would be the posterior distribution and its mean value describes the data best (see Figure 1).
What generates the prior distribution is a covariance kernel or function. This governs the similarity (covariance) between any two function values, at x i and x j say. It is chosen based on our prior knowledge of the system. There exists a multitude of kernels which can model unknown functions based on their characteristics, e.g. periodic, linear etc. [e.g. 20]. It is noteworthy that sums and products of kernels produce valid kernels too. For the purposes of this paper, the squared-exponential (SE) kernel is chosen. This is the simplest kernel and the only underlying assumption is that of a smooth process (i.e. the functions it produces have derivatives of all orders everywhere in their domain). The SE kernel is defined as with l the the characteristic length scale, i.e. the distance in the x dimension after which the function changes significantly and σ 2 f , the variance, i.e. the mean distance of the function from its mean (serves only as a scaling factor). Here, l and σ 2 f are the hyperparameters. Uncertainty is accommodated by simply adding a term describing it, to the noiseless SE covariance kernel of Eq. 1, which then becomes with I the identity matrix [e.g. 13].

Training the Gaussian process
Bayesian model selection provides the way for optimizing the hyperparameters, thereby refining the posterior distribution and enabling us to perform inference. We essentially update our prior knowledge in the light of a data set. This is done by means of maximizing the log marginal likelihood [12], as outlined below.
For the SE kernel, the vector of hyperparameters is θ = {l, σ 2 } and the probability (or evidence) of the training data y, given the hyperparameters vector θ, is p(y | x, θ). The log marginal likelihood is given by More generally, in the case of a hyperparameter vector θ = {θ j | j = 1, ..., n}, the derivatives of the log marginal likelihood with respect to each θ j are By means of numerical gradient optimization algorithms one can maximize the log marginal likelihood using Eq. 4 and obtain the best set of hyperparameters (for an outline of the algorithm see [19]).

Application to the light curves of the blazar PKS 1502+106
PKS 1502+106 (S3 1502+10, OR 103) is a powerful flat-spectrum radio quasar (FSRQ), driven by a supermassive black hole of about 10 9 M , at redshift z = 1.8385 (see [21] and references therein). The source has been detected by Fermi in 2008 showing an intense γ-ray outburst at MeV-GeV energies [22]. The flare was monitored by F-GAMMA and other programs in the radio band. Radio data from the F-GAMMA program 1 were employed, including observations with the Effelsberg 100-m (EB) and the IRAM 30-m (PV) telescopes in ten frequency bands from 2.64 to 142.33 GHz. Monthly observations from the EB and PV telescopes were performed quasi-simultaneously, thus ensuring maximum spectral coherency. A detailed description of the observational setup and data reduction is provided elsewhere [7,10,23,24]. 1 www.mpifr-bonn.mpg.de/div/vlbi/fgamma/fgamma.html For the results at all frequencies see [18].
Considering the full length of available multi-frequency light curves, GP regression was applied to the data at each frequency separately. A variant of the software by [25] was used, which was adapted to our specific modelling needs. A suite of machine learning applications for Python, including GP regression, is available online 2 . To obtain the best unbiased result, the length scale parameter l was randomly initialized 100 times. Then, the posterior mean is returned with a 95% confidence interval for the flux density, along with the set of hyperparameters which maximize the log marginal likelihood. The relevant light curve parameters extracted are the peak amplitude (S m ), the time of the peak, and the cross-band delay. Furthermore, the flare rise and decay times are calculated between the peak and the two flux density minima which immediately precede and follow S m [18].

Quantifying the broadband outburst of PKS 1502+106
PKS 1502+106 has shown an isolated outburst visible across the electromagnetic spectrum. The results of the GP regression are visible in the right panels of Fig. 2. The amplitudes at each of our radio frequencies first increase, up to about 43 GHz where they peak, and then drop at higher ν (Fig. 4 of Ref. [18]).The flare peaks with increasing time delay as we follow it towards lower frequencies (Fig. 5 of Ref. [18]). The dependence of amplitudes and delays on ν can be well approximated by power laws, a telltale sign of shock evolution within the source's parsec-scale jet. These results are in agreement with the study of the source using mm-wave very-long-baseline interferometry (mm-VLBI; see [21]).
The observed time lags between radio frequencies can be attributed to synchrotron opacity effects. In this context, the flare maxima appear first at higher frequencies and gradually progress to lower frequencies [e.g. 2]. Timing the flare maximum at each frequency with respect to the data at 142.33 GHz yields the following delays. The longest delay of (205 ± 2.0) days is seen between 142.33 and 2.64 GHz. Decreasing lags are clearly visible towards higher frequencies with a time lag of (136.9 ± 8.  (Table 3 of Ref. [18]).
The aforementioned lags allow the calculation of the synchrotron opacity profile of PKS 1502+106. Considering that at a given frequency the core represents a feature whose optical depth is close to unity, the standard relativistic jet paradigm predicts that the apparent position of this unit-opacity surface depends on observing frequency (see [18] and references therein). The importance of this "core-shift" effect lies in the fact that it gives critical insights into the physical processes of ultracompact jets, like the de-projected distance between the core (at each frequency) and the jet base, as well as the strength of the magnetic field along the flow [e.g., 26,27]. Using a "time-lag core shift" method, which proves to be a good alternative to VLBI core-shifts [e.g. 9,28], the positions of the nine unit-opacity surfaces (or photospheres) with respect to the jet base were inferred. These are: (10.2 ± 1.2) pc for the 2.64 GHz core, (7.1 ± 1.0) pc at 4.85, (5.3 ± 0.8) pc at 8.35, (4.8 ± 0.8) pc at 10.45, (4.0 ± 0.7) pc at 14.60, (2.6 ± 0.6) pc at 23.05, (2.9 ± 1.0) pc at 32.00, (3.4 ± 0.8) pc at 43.05, and (4.0 ± 1.1) pc at 86.24 GHz (see Table 4 of Ref. [18] for the values and their calculation). Finally, we obtain the equipartition magnetic field along the jet axis, with values increasing from 14 mG at the 2.64 GHz core up to 176 mG at 86.24 GHz.

Where do the γ rays come from?
The location of the γ-ray emitting region is decisively constrained by combining the aforementioned findings with those of [23] and [21]. Specifically, in [23] the authors calculate the relative distance between the 86 GHz core and the γ-ray site to be about 2.1 pc, while in [21], using high-resolution mm-VLBI imaging, the knot most likely connected with the flare in question is identified (knot C3) and its kinematical properties are deduced (β app ∼ 7 c, jet viewing angle of 2.6 • ).
Having calculated the absolute distance between the jet base and the 86.24 GHz core (4.0 ± 1.1) pc, and with the relative separation between the same core and the γ-ray site (2.1 pc), the high-energy emission originates at (1.9 ± 1.1) pc from the jet base. This places it far from the bulk of the broad-line material of the source (R BLR ≈ 0.1 pc) making external Compton, on a field other than that of the BLR, a very relevant emission mechanism for PKS 1502+106's γ ray emission, with implications on the origin of the seed photon field (see [18] and references therein).