Searching for high-z radio galaxies with the MGCLS

We present results from a search for high-redshift radio galaxy (H$z$RG) candidates using 1.28 GHz data in the Abell 2751 field drawn from the MeerKAT Galaxy Cluster Legacy Survey (MGCLS). We use the H$z$RG criteria that a radio source is undetected in all-sky optical and infrared catalogues, and has a very steep radio spectrum. We cross-match the radio catalogue against multi-wavelength galaxy catalogues from DECaLS and AllWISE. For those radio sources with no multi-wavelength counterpart, we further implement a radio spectral index criterium of $\alpha<-1$, using in-band spectral index measurements from the wide-band MeerKAT data. Using a 5$\sigma$ signal-to-noise cut on the radio flux densities, we find a total of 274 HzRG candidates: 179 ultra-steep spectrum sources, and 95 potential candidates which cannot be ruled out as they have no spectral information available. The spectral index assignments in this work are complete above a flux density of 0.3 mJy, at least an order of magnitude lower than existing studies in this frequency range or when extrapolating from lower frequency limits. Our faintest HzRG candidates with and without an in-band spectral index measurement have a 1.28\,GHz flux density of 57 $\pm$ 8 $\mu$Jy and 68 $\pm$ 13 $\mu$Jy, respectively. Although our study is not complete down to these flux densities, our results indicate that the sensitivity and bandwidth of the MGCLS data makes them a powerful radio resource to search for H$z$RG candidates in the Southern sky, with 20 of the MGCLS pointings having similar image quality as the Abell~2751 field and full coverage in both DECaLS and AllWISE. Data at additional radio frequencies will be needed for the faintest source populations, which could be provided in the near future by the MeerKAT UHF band (580 -- 1015 MHz) at a similar resolution ($\sim$ 8-10 arcsec).


Introduction
Observing the high redshift Universe is critical to our understanding of cosmological and astrophysical processes. It allows us to probe large-scale structure, galaxy evolution, and the period of time when the first stars formed, known as the epoch of reionisation (z > 6). In the radio regime, the high-redshift Universe can be probed through high redshift radio galaxies (HzRGs). These galaxies, with large volumes of dust and gas [e.g., 1,2], are expected to be actively forming stars and may be the progenitors of the massive ellipticals seen in the arXiv:2110.14986v2 [astro-ph.GA] 17 Nov 2021 local Universe [3,4]. With their link to high density environments such as proto-clusters or the centre of galaxy clusters (see the review by Miley and De Breuck [5] and references therein), HzRGs are critical targets to understand the formation and evolution of galaxies at early times, and their connection to large scale structure formation.
Creating complete samples of HzRGs is challenging. The radio regime does however produce optically unbiased samples as it is unaffected by dust absorption, and is therefore the most efficient band in which to detect high redshift galaxies [6]. Spinrad [7] was the first to identify a radio galaxy beyond z ∼ 1, with the highest redshift radio galaxy currently known being at z = 5.72 [8]. Current searches for HzRG candidates require an additional data source, either in the form of multi-wavelength catalogues, or multi-frequency radio data.
As HzRGs are by definition at high redshift (z 2), the effect of redshift dimming can be exploited with HzRG candidates identified as lacking a multi-wavelength counterpart when cross-matching with optical and/or infrared (IR) sky survey catalogues [9]. Where near-IR data is available, the K − z relation [10] has been used to identify potential HzRGs when an infrared counterpart is found [e.g., 11,12]. Norris et al. [13] discovered a population of infrared-faint radio sources (IFRSs) after cross-matching radio and IR data for one of the Australia Telescope Large Area Survey fields. IFRSs are characterised by a radio source with a very faint or non-detection in deep IR data and may be a related to, or a subset of, HzRGs as they are also found at high redshifts [14][15][16].
Early searches for HzRG candidates were restricted to bright (S 1.4 GHz > 1 − 10 mJy) radio sources extracted from available wide-area radio surveys [e.g., 36,40]. This flux density limit is gradually being reduced thanks to sensitive imaging with the new generation of radio interferometers [see e.g., 33], although implementation of the USS cut typically still requires multi-frequency radio observations to obtain spectral indices. The wideband receivers of the current pre-SKA era telescopes may be able to circumvent this requirement. The advancements in the quality of radio data, coupled with the availability of wide-area optical and infrared surveys, such as the Dark Energy Camera Legacy Survey [DECaLS; 41] and the All-sky Widefield Infrared Survey Explorer [All-WISE; 42], are making HzRG searches more accessible.
In this paper we present the first results of a search for HzRG candidates in the Abell 2751 field using deep, wideband imaging from the MeerKAT Galaxy Cluster Legacy Survey [MG-CLS; 43]. We cross-match the 1.28 GHz data with the DECaLS and AllWISE catalogues, and use spectral index information from the wide MeerKAT band to implement USS cuts. The paper is structured as follows. In Section 2 we introduce the radio data used in this study. In Section 3 we describe the multi-stage methodology used to determine HzRG candidates. The results are discussed in Section 4 with a summary and concluding remarks given in Section 5. Throughout this article we assume a flat ΛCDM cosmology with H 0 = 70 km/s/Mpc and Ω m = 0.3. 1 Here we assume the flux density convention S ν ∝ ν α where α is the spectral index. 2 Many earlier studies use a USS criterium of α < −1.3.

Radio catalogue: MGCLS Abell 2751 field
We use the first data release (DR1) of the compact source catalogue from the MeerKAT Galaxy Cluster Legacy Survey [MGCLS ; 43]. The MGCLS consists of long-track (6-10 hour) observations of 115 galaxy clusters in the Southern sky using MeerKAT's L-band receiver (900 -1670 MHz). For each cluster field, a primary beam-corrected image is produced at an effective observing frequency of 1.28 GHz. Each primary beam-corrected image spans a 1.2 • × 1.2 • sky region with a typical central RMS noise of 4−6 µJy beam −1 in the full-resolution survey (≈ 7.5 −8 full-width half-maximum (FWHM)). The source catalogue for each field was generated using the Python Blob Detection and Source Finder (PYBDSF) package [44] with the default thresholds (3σ island RMS and 5σ peak RMS), and the final compact source catalogue includes only those sources identified as having only a single Gaussian component by PYBDSF. Note that the dimensions of the elliptical Gaussian were free in PYBDSF, so single-component sources are not necessarily unresolved.
The MGCLS DR1 images have not been calibrated for direction-dependent effects and therefore some fields present artefacts around the brightest sources. The level of contamination varies with the source brightness and position of the source within the primary beam. The PYBDSF source finder incorrectly identifies some of these artefacts as real sources. To mitigate the level of artefact contamination in the final DR1 compact source catalogue, all catalogue sources within a given radius of a sufficiently bright source are removed. With this method, some real sources are removed along with the contaminating artefacts. However, the total number of sources removed (real and artefact) amounts to ∼ 2% of the total number of catalogued sources, and the removal of real sources is not expected to have a large effect on this work.
For this study we use the Abell 2751 cluster field, centred at RA = 00 h 16 m 13.92 s , Dec. = −31 • 23 18.60 , with coverage from the DECaLS and AllWISE catalogues (see Section 3.2). The primary beam-corrected image has low RMS noise (6.6-11 µJy beam −1 at ∼ 7.8 resolution) 3 and minimal visual contamination by bright source artefacts. The artefact-excised DR1 compact source catalogue for the Abell 2751 field consists of 3610 radio sources.

Identifying HzRG candidates
We use a multistage approach to identify possible HzRG candidates in our radio catalogue, using sequential cuts to arrive at the final list of candidates. Given the number of sources involved, we use in-house automated tools where possible. In this section we discuss each step of our process and the resultant whittling down of the catalogue to the final candidate list. Table 1 indicates the fraction of potential candidates remaining after each step of the process.

Step 1: Signal-to-noise and angular size requirements
Many HzRG searches enforce a flux density cut, with the most recent studies using sub-mJy limits at low (sub-GHz) frequencies [e.g., 33]. Given the sensitivity of the MGCLS data, we could potentially probe down to S 1.28 GHz ∼ 35 mJy at the 5σ level. However, although we selected one of the MGCLS fields with the fewest bright-source artefacts, the image noise does vary across the field due to the primary beam correction, with the local RMS noise increasing away from the image centre. Rather than implement a fixed flux density cut, we apply a signal-to-noise ratio (SNR) restriction using the fitted flux density versus its associated uncertainty from the source catalogue 4 as the first step of the process. After applying a fitted flux density SNR requirement of 5σ, the radio catalogue is reduced to 1700 sources. It is possible that the input catalogue discussed in Section 2 may be missing faint sources due to the source finding parameters. However, as the faintest sources are excluded through this SNR cut, this is not expected to affect the number of potential HzRG candidates in this study.
We simultaneously apply an angular size restriction of s maj ≤ 10 . Angular size restrictions have been necessary in HzRG searches using lower resolution (lower frequency) data in order to preferentially select high redshift sources [33]. Our radio data have a resolution of ∼ 7.8 so this requirement may not be necessary given that we are using a compact/point source catalogue. However, investigation of the catalogue shows that several sources, although meeting the single Gaussian fit requirement, are well resolved. Following Saxena et al. [33], we therefore restrict the catalogue to sources with an angular size smaller than ∼ 10 . There are 1414 radio sources after applying the SNR and angular size limits.

Step 2: Multi-wavelength cross-matching
The next step of the process is to determine which radio sources have neither optical nor infrared counterparts using publicly available catalogues. To discard relatively bright nearby galaxies, we first cross-match our SNR-cut catalogue against the eighth data release of the Dark Energy Camera Legacy Survey [DECaLS; 41]. The DECaLS catalogue includes g-, r-, and z-band magnitudes, and has a positional accuracy of ≈ 20 mas. The sources with no identified DECaLS counterpart are then cross-matched against the All-sky Wide-field Infrared Survey Explorer (AllWISE) catalogue [42] which provides mid-infrared source data. We use the AllWISE W1, W2, and W3 band data, which have 5σ median sensitivity limits of 54 µJy, 71 µJy, and 730 µJy respectively, and astrometric accuracies in the Abell 2751 region of better than 70 mas.
We use the likelihood ratio (LR) method, introduced by Sutherland and Saunders [46] and further adapted by Ciliegi et al. [47], to cross-match our radio and multi-wavelength catalogues and determine optical or infrared counterparts for our radio sources. Here we summarise the key properties of the LR method and refer the reader to Sutherland and Saunders [46] and Georgakakis et al. [48] for more details. An LR value is determined for each potential match within a specified search radius of a given radio source, where the LR compares the probability that a source match is the true counterpart to that of the same source being a spurious match, i.e., where q(m) is the expected magnitude distribution for the true counterparts, f (r) is the probability distribution function of the positional uncertainties in the involved catalogues, and n(m) is the surface density of the unrelated background objects with magnitude m. For our catalogues, the probability distribution function is a two-dimensional Gaussian distribution In cases where there is more than one potential match to a given radio source, the LR method can be used to calculate the reliability, R i , of each match: where ∑LR radius is the sum of LR for all possible counterparts to a radio source within our defined search radius, and Q is the fraction of the radio sources with catalogue counterparts above the magnitude limit. The LR method can therefore be used to identify the most likely counterpart for a given source in cases where there may be more than one possible match. An estimate of the spurious identification rate (or error rate) can be obtained by comparing R i with the total number of counterparts above a certain LR value. This cutoff in LR aims to optimize the completeness versus error rate of the cross-matching, where completeness is defined as the fraction of the radio catalogue that is assigned a counterpart. The LR method implementation in the ASTROMATCH 5 code used in this work provides a facility to automatically optimize the LR cutoff and we use the code-generated optimal cut in our cross-matching here.
We use a search radius of 4 for the MGCLS-DECaLS and MGCLS-AllWISE crossmatching, optimised to produce the smallest number of spurious versus real matches. At this search radius, the optimal cutoff in LR is 0.2, shown by the vertical dashed line in the left panel of Figure 1. With LR > 0.2 our cross-matching has a completeness of 59% with an error rate less than 4%. Considering the best matches only, 587 of the radio sources have no identified optical counterpart. We cross-match these against the AllWISE catalogue. The right panel of Figure 1 shows the completeness and error rate versus all values of LR for the MGCLS-AllWISE cross-matching. The optimal LR cutoff is 0.8 in this case, with ∼ 29% completeness and an error rate less than 3%. We are left with 415 radio sources with no optical or mid-infrared counterpart.

Step 3: USS selection
There is coverage of this field at 843 MHz from the Sydney University Molonglo Sky Survey [SUMSS; 49], however, the SUMSS data are of significantly lower resolution than the MGCLS data and source confusion would prohibit accurate spectral index determinations for all but the most isolated sources. There is also coverage from the Very Large Array Sky Survey [VLASS; 50] which provides ∼ 2.5 -resolution data at 3 GHz. Cross-matching the list of candidates from the previous step, we obtain α 3GHz 1.28GHz spectral indices for only 13 sources. We therefore exploit the wideband nature of the MGCLS data, which covers a ∼ 700 MHz range in frequency (fractional bandwidth of 0.55), to extract in-band spectral index values for the remaining radio sources with no identified multi-wavelength counterpart.
One of the MGCLS DR1 products is a 12-plane frequency cube for each field. The effective GHz observing frequency of each plane is 0. 908, 0.952, 0.997, 1.043, 1.093, 1.145, 1.317, 1.381,  1.448, 1.520, 1.594, and 1.656, respectively. We perform PYBDSF source finding, using default settings, in each plane and cross-match the resultant source catalogues to build up the flux density distribution for each source. We use a weighted Levenberg-Marquardt least-squares fitting procedure to fit a power-law spectrum of the form S ν ∝ ν α to each distribution. We only produce a fit if a source is detected in at least two frequency planes -if the source has only two flux density measurements, we further require a minimum frequency separation of 200 MHz in order to proceed with the fit. As flux density measurements are taken from the same dataset, with sub-band images having identical resolution, any spatial variability in the flux density scale should affect all sub-band images. For a given source, we therefore do not expect any bias in the spectral index fitting due to this effect.
We obtain a spectral index fit for 306/415 sources from step 2. Implementing a USS cut of α < −1 (suitable for low flux density sources; see [31] for details), we have 231/415 potential HzRG candidates. There are 109/415 radio sources which do not have spectral index information as they did not have sufficient flux density information for a fit. We retain these sources as potential candidates.

Step 4: Manual cross-checking with NED and Vizier
As a final step, we manually check the 340 (231 + 109) candidates against the NASA/IPAC Extragalactic Database [NED; 51] to look for any redshift information, or whether a source exists in other optical/IR catalogues. We use a NED search radius of 4 . Six of the HzRG candidates have available redshift information, five of which have z < 0.12 and are therefore discarded from the candidate list. The sixth source, MKTCS J001802.40−313505.5, is confirmed as a high-redshift galaxy at z = 2.13 [H-ATLAS J001802.2−3135005; 52]. An additional 60 sources have NED cross-matches from other optical/IR catalogues and are discarded as candidates. Finally, we use Vizier [53] to cross-match against The Million Quasars catalogue [v7.2, 54] -we find no matches within 4 of our candidates. Our final HzRG candidate list contains a total of 274 sources: 179 USS sources and 95 potential HzRG candidates with no spectral index information.

Discussion
An excerpt of the final list of 274 HzRG candidates is given in Table 2. The candidates amount to 8% of the sources in the Abell 2751 MGCLS compact source catalogue, with spectroscopic follow-up or high quality photometry required to confirm the redshift of the sources.
The flux density distribution of all the candidates is shown in the left panel of Figure 2. As expected, a large fraction of the fainter sources (S 1.28 GHz 120 µJy) are lacking a spectral index measurement due to the source having insufficient SNR across the band. However, we have complete spectral index coverage above a flux density of 0.3 mJy, which is the faintest source population inspected for HzRG candidates at GHz frequencies to date.
To investigate a potential positional effect (e.g., from the primary beam correction at different frequencies) on the ability to assign a spectral index value for a source, we plot the Table 2. Excerpt of the final catalogue of 274 HzRG candidates after implementation of an SNR and angular size cut, DECaLS and AllWISE catalogue cross-matching, USS selection, and manual checks with NED. 25% of the candidates have no spectral index information and are retained as possible HzRGs (see Section 3.3 for details). The catalogue includes the IAU MGCLS source identifier, the J2000 radio position in decimal degrees, the 1.28 GHz flux density and associated statistical uncertainty, the source size (major and minor axes, and p.a.), and the fitted in-band spectral index (where available). Note that the sizes are after convolution with the CLEAN beam, and the intrinsic sizes will typically be smaller. The full catalogue is available online at https://github.com/kendak333/MGCLS_HzRGs.  sky positions of all 340 candidates in Figure 2. The sources are coloured by whether the source has a spectral index or not, and we see no positional trend with respect to the undetermined α values. This may be a positive indication that the SNR cut adequately takes into account the changes in noise levels across the primary beam-corrected field of view for these data.

Summary and Conclusion
We have presented the results of a search for high-redshift radio galaxy (HzRG) candidates at 1.28 GHz using one of the 115 cluster fields, namely Abell 2751, from the MeerKAT Galaxy Cluster Legacy Survey [MGCLS ; 43]. We used a multi-stage approach, first implementing a 5σ signal-to-noise and < 10 angular size cut on the radio catalogue before crossmatching against catalogues from DECaLS and AllWISE. Taking advantage of the wideband MeerKAT data, we were able to determine in-band spectral index measurements for 74% of the radio sources which had no multi-wavelength counterpart after the cross-matching. Using an ultra-steep spectrum cut of α < −1, and performing manual checks for supplementary redshift information in NED, we compiled a final catalogue of 274 HzRG candidates, 95 of which have no available spectral index information and cannot be excluded. Our spectral index coverage is complete above a flux density of 300 µJy. This is the lowest flux density at GHz frequencies to be probed for HzRGs to date.
This study shows the great potential in the MGCLS DR1 data, and indeed any longtrack MeerKAT L-band data, for finding HzRG candidates. With 20 of the 115 MGCLS fields having good dynamic range and full coverage in both DECaLS and AllWISE, extending this study to these fields may produce more than ∼ 4000 HzRG candidates for spectroscopic follow-up. Additional frequency radio data will still be needed in order to obtain spectral index measurements for sources in the sub-100 µJy flux density ranges probed by MeerKAT. MeerKAT's UHF band, with ∼ 7.5 resolution at near-uniform weighting, could provide such data with a small request of telescope time.

Data Availability Statement:
The MGCLS data products used in this work are proprietary until the survey paper [43] has been accepted for publication.
Acknowledgments: MGCLS data products were provided by the South African Radio Astronomy Observatory and the MGCLS team and were derived from observations with the MeerKAT radio telescope. The MeerKAT telescope is operated by the South African Radio Astronomy Observatory, which is a facility of the National Research Foundation, an agency of the Department of Science and Innovation. DE-Cam data were obtained through the Legacy Surveys imaging of the DESI footprint, which is supported