The Classification of Blazars Candidates of Uncertain Types

In this work, the support vector machine (SVM) method is adopted to separate BL Lacertae objects (BL Lacs) and flat spectrum radio quasars (FSRQs) in the plots of photon spectrum index against the photon flux, $\alpha_{\rm ph} \sim {\rm log}\,F$, that of photon spectrum index against the variability index, $\alpha_{\rm ph} \sim {\rm log}\, \textit{V\!I}$, and that of variability index against the photon flux, ${\rm log}\,{V\!I} \sim {\rm log}\,F$. Then we used the dividing lines to tell BL Lacs from FSRQs in the blazars candidates of uncertain types from \textit{Fermi}/LAT catalogue. Our main conclusions are: 1. We separate BL Lacs and FSRQs by $\alpha_{\rm ph} = -0.123\,{\rm log}\,F + 1.170$ in the $\alpha_{\rm ph} \sim {\rm log}\,F$ plot, $\alpha_{\rm ph} = -0.161\,{\rm log}\,{V\!I} + 2.594$ in the $\alpha_{\rm ph} \sim {\rm log}\,{V\!I}$ plot, and ${\rm log}\,{V\!I} = 0.792\,{\rm log}\,F + 9.203$ in the ${\rm log}\,{V\!I} \sim {\rm log}\,F$ plot. 2. We obtained 932 BL Lac candidates and possible BL Lac candidates, and 585 FSRQ candidates and possible FSRQ candidates. 3. Some discussions are given for comparisons with those in literature.


Introduction
As a special subclass of active galactic nuclei (AGNs), blazars show some extreme observational properties: high amplitude and rapid variability superposed on the long-term slow variation light curve, high polarization, powerful γ-ray emissions, some sources emitting TeV emissions, strong broad emission line features or have no emission line at all, or superluminal motions, [1][2][3][4][5][6][7][8][9][10][11][12][13][14]. Those extreme observational properties are explained by a beaming model, in which there is a central supermassive black hole surrounded by an accretion disk, and two jets being perpendicular to the disk. When the jet points to the observer, the observed emission, f ob , is boosted and the variability time scale, ∆t ob is shortened by f ob = δ q f in , and ∆t ob = ∆t in /δ, where f in and ∆t in are the emission and the variability time scale in the comoving frame, δ = [Γ(1 − β cos θ)] −1 is a boosting factor (or Doppler factor), Γ = 1/ 1 − β 2 , is the Lorentz factor, θ is the viewing angle, and β is the jet speed in units of the speed of light, β = v/c, and q is a parameter depending on the jet case: q = 2 + α is for the case of a continuous jet, while q = 3 + α is for the case of a moving sphere [15], and α is a spectral index defined by f ν ∝ ν −α .
Based on the behavior of emission lines, blazars can be classified into two subclasses, namely BL Lacertae objects (BL Lacs) and flat spectrum radio quasars (FSRQs). FSRQs have strong broad emission lines with the equivalent width being greater than 5 (EW > 5), while BL  3 Fan et al. [22] 1392 log ν p (Hz) < 13.7 13.7 < log ν p (Hz) < 14.9 log ν p (Hz) > 14. 9 Yang et al. [9] 2709 *N denotes the sample size that the authors used to get the boundary. Lacs show only weak or no emission line at all, or EW < 5 [6,16,17]. BL Lacs were classified as radio selected BL Lacs (RBLs) and X-ray selected BL Lacs (XBLs) from surveys. The both have differently observational properties in Hubble diagrams, multiwavelength correlations, spectral index correlations, and linear optical polarizations, etc.
[18] and references therein. The physical classification of BL Lacs was that by Padovani & Giommi [19], who calculated the spectral energy distributions (SEDs) for a sample of BL Lacs objects and proposed to use the synchrotron peak frequency (ν p ) to separate BL Lacs into highly peaked BL Lacs (HBLs) with log ν p (Hz) > 15 (the base of the logarithms is ten throughout this paper) and lowly peaked BL Lacs (LBLs) with log ν p (Hz) < 15, and it was found that most RBLs belong to LBLs while XBLs to HBLs.
[21] calculated SEDs using the quasi-simultaneous data of 48 Fermi blazars, and extended the definition to all types of non-thermal dominated AGNs using new acronyms: low synchrotron peaked blazars(LSP) if log ν p (Hz) < 14, intermediate synchrotron peaked blazars (ISP) if 14 < log ν p (Hz) < 15, and high synchrotron peaked blazars (HSP), if log ν p (Hz) > 15. They also proposed an empirical parametrization to estimate the synchrotron peak frequency using the effective radio-optics spectral index (α ro ) and the effective optics-X-ray spectral index (α ox ).
Fermi/LAT missions have detected a lot of γ-ray emitters, more than 60% of the Fermi/LAT detected sources are AGNs. and 99% of the Fermi/LAT AGNs are blazars So, γ-ray emission is a typically observational property of blazars, and γ-ray emission was taken as one of the observation properties of blazars. Up to now, several catalogues have been released [2,3,21,23,24].
The γ-ray loud blazars are variable on different time scales [9]. Fermi/LAT detected a lot of γ-ray emitters, there are 5 catalogues of Fermi/LAT mission, which provide us with a nice opportunity to investigate the variability properties in the γ-ray band. The variability level in the γ-ray was introduced by a so called variability index (VI) defined as by Abdollahi et al. [1]: here S i are the individual flux values, L i (S) is the likelihood in the interval i assuming flux S, and σ i are the errors on S i , S av is the average flux and S glob is the globe flux. The latest fourth Fermi/LAT catalog (4FGL) with 5099 sources was published [1,3]. Out of them 1432 are BL Lacs, 795 are FSRQs, and 1518 blazar candidates of uncertain type (BCUs). The identification of the BCUs is interesting and it can provide more sources for us to investigate the different physics in BL Lacs and FSRQs. The identifications of BCUs were carried out in many works [9,[25][26][27][28][29][30][31][32][33][34] In this work, we apply the support vector machine (SVM) learning method to separate BL Lacs and FSRQs and then use the dividing line to tell BL Lac candidates from FSRQ candidates from the BCUs. The work is arranged as follows: In the 2nd section, a sample, from 4FGL_DR3, used in the work will be described, in 3rd section the distributions of the physical parameters will be given for BL Lacs and FSRQs, and the SVM method will be used to separate BL Lacs and FSRQs, and divide BL Lac candidates and FSRQ candidates, some discussions and conclusions are given in section 4 and section 5.

Samples
In this work, we obtained 3743 blazars from the 4FGL catalogue [1,3], which include 1432 BL Lacs, 794 FSRQs, and 1517 BCUs. We only list 1517 BCUs in Table 2 since we want to classify them in this work.

Average Values
γ-Ray Photon Flux -log F: Based on the photon flux intensity from the 4FGL catalogue [1,3], we got the logarithm of the γ-ray photon flux (log F) and showed their distributions for FSRQs and BL Lacs in the upper-left panel of Fig. 1, and their cumulative distributions are in the upper-right panel of Fig. 1. Their averaged values are log F = −9.294 ± 0.520 for FSRQs, and log F = −9.434 ± 0.482 for BL Lacs. When a K-S test is performed to the distributions, a probability p = 6.708 × 10 −7 for the two distributions to be from the same parent distribution is obtained.
Photon Spectral Index-α ph : We showed the distributions of α ph for FSRQs and BL Lacs in the middle-left panel in Fig. 1, and their cumulative distributions are showed in the middleright panel of Fig. 1. The average photon spectral indexes are α ph = 2.470 ± 0.201 for 795 FSRQs, and α ph = 2.032 ± 0.212 for 1432 BL Lacs. The K-S test gives p = 7.77 × 10 −16 .
Variability Index -VI: For the variability index, we calculated the corresponding logarithm and showed their distributions for FSRQs and BL Lacs in the lower-left panel of Fig. 1, and their cumulative distributions are in the lower-right panel of Fig. 1. For the averaged values, we have log VI = 2.025 ± 0.777 for FSRQs, and log VI = 1.393 ± 0.481 for BL Lacs. The probability for the two distributions to be from the same parent distribution is p = 7.77 × 10 −16 .

Correlations
From the available data: photon flux (F), photon Spectral Index (α ph ), and variability spectral index (VI), we can make mutual correlations.  Flux versus Variability Index (F − VI): From the γ-ray photon flux and the variability index, we obtained their mutual correlation log VI = (1.018 ± 0.018) log F + 11.176 ± 0.170 with r = 0.766 and p ∼ 0 for all BL Lacs and FSRQs. The corresponding best fitting result is shown in lower panel of Fig. 2. While for BL Lacs and FSRQs with available redshift, one has log VI = (0.745 ± 0.018) log F + 8.420 ± 0.166 with r = 0.746 and p ∼ 0 for FSRQs, and log VI = (1.261 ± 0.025) log F + 13.741 ± 0.230 with r = 0.876 and p ∼ 0 for BL Lacs.

Classifications
From the mutual correlation analyses, it is found that BL Lacs and FSRQs show different correlation and they both occupy different regions in the plots. In this sense, we can try to find a dividing line to separate BL Lacs and FSRQs, and further we can use this dividing line to tell BL Lacs from FSRQs when the BCUs are put in the plots.
In the last version of the Fermi catalogue [1,3], there are 1517 blazar candidates unidentified type (BCUs). It is interesting to divide them into BL Lacs and FSRQs. In this work, we used a SVM, a kind of supervised machine learning (ML) method, to find a dividing line for separating the two blazar subclasses as in Yang et al. [9]. SVM is widely used for classification and regression problems in astrophysics studies [9,[37][38][39][40]. Consider there are two linearly separable samples in the N dimensional parameter space, thus there are infinite numbers of N − 1 dimensional hyperplanes can be found to separate them into two sides. The SVM is, then, applied to determine the plane with the maximum margin, i.e., the maximum distance to the nearest samples. For the case of non-linearly separable samples, SVM map the samples to a high-dimensional space and find the optimal separating hyperplane in the high-dimensional space. The SVM requires a training data set and a data set, that randomly takes 70% and 30% γ-ray photon flux (log F (photon/cm 2 /s)); middle panel for photon spectral index (α ph ) against variability index (log VI), lower panel for variability index (log VI) against γ-ray photon flux (log F (photon/cm 2 /s)).
sources of each type. The training set is used to find the optimal hyperplane, the test set is used to evaluate the classification accuracy of the hyperplane. In this work, we put BL Lac and FSRQ samples in the two-dimensional parameter space, that formed by either two (denote 'A' and 'B') of the three parameters log F, α ph and log VI. Assuming the hyperplane, is a line in the two-dimensional space, is expressed as w 1 A + w 2 B + m = 0. The factors w 1 , w 2 and m can be determined through training SVM with the training set. The svm.LinearSVR (from sklearn package) is employed as a SVM classifier, and the hyperparameters of svm.LinearSVR need to be specified before the SVM training starts. We iterate different combinations of hyperparameters in the training process until w 1 , w 2 and m converge to the the maximum margin. At last, we get a number of different optimal dividing lines, and the one with the highest accuracy on the test set is the final optimal dividing line. When the SVM is adopted to the (F − α ph ) data, the result gives an accuracy of 88.60% for the separation and a dividing line of α ph = −0.123 log F + 1.170 as shown in Fig. 3. One can notice that FSRQs mainly occupy the region with α ph > −0.123 log F + 1.170 and majority of BL Lacs occupy the region with α ph < −0.123 log F + 1.170.
When the 1517 BCUs are put into the plot, we found there are 639 BCUs locate in the region above the dividing line and they can be taken as FSRQ candidates (FC) while there are 878 BL Lac candidates (BC) since they are in the region below the dividing line.
When we considered the (α ph − VI) plot, we found that BL Lac and FSRQs can be divided by α ph = −0.161 log VI + 2.594 with an accuracy of 89.26% as shown in Fig. 4. When the 1517 BCUs are put into the plot, we found 585 FSRQ candidates (FC) and 932 BL Lac candidates (BC).
For the (F − VI) plot, we found a dividing line of log VI = 0.792 log F + 9.203 with an accuracy of 79.16% as in Fig. 5. Based on which we obtained 337 FSRQ candidates (FC) and 1180 BL Lac candidates (BC).
In our consideration, we take a BCU as a BL Lac candidate (BC) if it is below the dividing line in the three plots, and as a possible BL Lac candidate (p-BC) if it is below the dividing line in any two plots; For FSRQs we also take the same consideration. Therefore, we have got 751 BL Lac candidates (BCs) and 181 possible BL Lac candidates (p-BCs), namely 932 BC and p-BCs in total; 210 FSRQ candidates (FCs) and 375 possible FSRQ candidates (p-FCs), namely 585 FCs and p-FCs in total. The ratio of the number of FCs and p-FCs versus BC and p-BC is ∼ 2 3 ( 585 versus 932 ).
When we considered the (α ph − VI) plot, we found that BL Lac and FSRQs can be divided by α ph = −0.161 log VI + 2.594 with an accuracy of 89.26% as shown in Fig. 4. When the 1518 BCUs are put into the plot, we found 377 FCs and 714 BCs. For the (F − VI) plot, we found a dividing line of log VI = 0.792 log F + 9.203 with an accuracy of 79.16% as in Fig. 5. Based on which we obtained 223 FCs and 888 BCs.
In our consideration, we take a BCU as a BL Lac candidate if it is below the dividing line in the three plots, and as a possible BL Lac candidate if it is below the dividing line in any two plots; For FSSRQs we also take the same consideration. Therefore, we have got 590 BCs and 124 p-BCs, 714 BC and p-BCs in total; 145 FCs and 232 p-FCs, 377 FC and p-FCs in total.

Discussions
After the launch of Fermi/LAT in 2008, a series of catalogues have been released. The latest catalogue [1]  It is clear that there a lot of blazars with VI < 18.48, we will miss classification for a lot of BCUs if we only consider the BCUs with VI > 18.48. That is why we considered all BCUs listed in Abdollahi et al. [1] in the present work.

The average values
For the observational data, the γ-ray photon flux (log F), the photon spectral index (α ph ), and the variability index (log VI). It can be found that the averaged values of three physics parameters in FSRQs are greater than those of BL Lacs. The K-S test indicates that the probability (p) for the distribution for FSRQs and that for BL Lacs to be from the same parent distribution is p < 6.708 × 10 −7 .
For the subclasses of BL Lacs, we will investigate the LBLs and HBLs, we do not consider IBLs since IBLs maybe include some LBLs and HBLs. In this sense, it is found that, for the photon spectral index, α ph = 2.197 ± 0.168 for LBLs, and α ph = 1.902 ± 0.149 for HBLs, which show clear difference between LBLs and HBLs with p = 5.5 × 10 −92 ; for the photon flux, log F = −9.301 ± 0.482 for LBLs, and log F = −9.398 ± 0.485 for HBLs with p = 4.25%; and for the variability index, log VI = 1.584 ± 0.594 for LBLs, and log VI = 1.377 ± 0.421 for HBLs with p = 1.33 × 10 −6 . When we considered FSRQs and LBLs for comparison, it is found that the probability for the two subclasses of blazars to be from the same parent population is p = 1.75 × 10 −75 for the photon spectral index, α ph , p = 37.8% for the photon flux, log F, and p = 1.17 × 10 −20 for variability index, log VI. The comparisons between FSRQs and HBLs give p ∼ 0 for the photon spectral index, α ph , p = 4.6 × 10 −3 for the photon flux, log F, and p = 9.39 × 10 −56 for variability index, log VI.
One can see a clear difference in the photon spectral index (α ph ) between FSRQs and LBL, between FSRQ and HBL, and between LBLs and HBLs, giving α ph | FSRQ > α ph | LBL > α ph | HBL . It is also found that log VI | FSRQ > log VI | LBL > log VI | HBL . A clear difference in photon flux (log F) between FSRQ and HBL (p = 4.6 × 10 −3 ) and a marginal different between HBLs and LBLs with a p = 4.26% are found. However, no clear difference in the photon flux between LBL and FSRQs. We can say there is a sequence from FSRQ to LBL to HBL for photons spectral index and variability index that are similar to that pointed out by Fossati et al. [35], also see in Ghisellini et al. [36].

The correlations for FSRQs and BL Lacs
In this work, we considered the mutual correlations amongst α ph , log F, and log VI between FSRQs and BL Lacs. There is a tendency for a positive correlation between spectral index (α ph ) and photon flux (log F) for known blazars. When we considered BL Lacs and FSRQs separately, a clear anti-correlation is found for FSRQs with p = 4.42 × 10 −27 and a positive correlation for BL Lacs as shown in the upper panel of Fig. 2. BL Lacs and FSRQs show different spectral index dependence on photon flux. The both subclasses also show different spectral index dependence on the variability flux in the middle panel of Fig. 2, which indicates that the spectral index (α ph ) in FSRQs decreases with variability index log VI while that in BL Lacs increases with variability index. While for photon flux (log F) and variability index (log VI), blazars and the two subclasses all show positive correlation indicating that the variability index (log VI) increases with photon flux (log F) as shown in the lower panel of Fig. 2. We need to note that the linear correlations that we obtained amongst α ph , log F, and log VI do not mean strict mathematic linear correlation but demonstrate possible trends between two parameters. It is more reasonable to explore trends instead strict mathematical linear correlations. The theoretical relationship between the three parameters (and also for other astronomical quantities) is rarely investigated. Because these observational quantities show a significant discrepancy, the discrepancy leaves enough space for various models to explain the phenomenon. In the case of individual sources, the observational discrepancy can come for several reasons, the source's intrinsic reason (e.g. the black hole mass and spin, the accretion ratio, etc) and the external reason (e.g. the gas and dust density of the host galaxy, the magnetic field, the distance, etc). In the case of many sources, the distribution of the sources may serve a selection effect of the telescope or a few sources in the universe that have very high/low values of some quantities. All the above-mentioned reasons could obstruct us from discovering the linear correlation mathematically. For the sources in our sample, it is natural that most of the sources have low photon flux (i.e. log F < −8.5) and the variability index (i.e. log VI < 2.5) and those sources with very high photon flux (i.e. log F > −7.5) and the variability index (i.e. log VI > 4.5) are very rare in the universe, see in the middle and lower panels of Fig 2. Through the correlations study amongst α ph , log F, and log VI in this work. We conclude that the FSRQs show trends of anti-correlation for α ph vs log F and α ph vs log VI, while the BL

The classification for BCUs
It is found that most of BL Lacs and FSRQs occupy different regions in the plots of α ph versus log F (Fig. 3), α ph versus log VI (Fig. 4), and log VI versus log F (Fig. 5). When the support vector machine (SVM) method is adopted to the relevant data, separating lines were obtained, which can be used to give BL Lac and FSRQ candidates when BCUs are put in to the plots.
In those cases, we have obtained 639 FCs and 878 BCs in the α ph versus log F plot (Fig. 3), 582 FCs and 932 BCs in the α ph versus log VI plot (Fig. 4), and 337 FCs and 1180 BCs in the log VI versus log F plot (Fig. 5).
We take a BCU as an FC if it is classified as an FC in all the three cases, and we take a BCU as a p-FC if it is classified as an FC in any two cases. We also give the similar considerations for BCs and p-BCs. Our candidate classifications are shown in Table 2. Therefore, we have 932 BC and p-BCs, 585 FCs and p-FCs giving a number ratio ∼ 2 3 . We also made comparison with the classification results in Kang et al. [34]. We take a BCU as an FC if it was classified as an FC in all their three considerations, and as a p-FC if it was classified as an FC in any two of their considerations [34]. For BC, we also give a similar classification. If this case, we obtained that there are 302 FCs and 109 p-FCs, 556 BCs and 114 p-BCs, which are given in Col. (6) in Table 2

Conclusions
In this work, 3743 blazars from the 4FGL catalogue [1,3], which includes 1432 BL Lacs, 794 FSRQs, and 1517 BCUs. We analyzed their averaged values and mutual correlation amongst the photon spectral index, variability index and the photon flux for the known blazars. SVM method is adopted to separate BL Lacs and FSRQs, afterwards, we used the separating line to classify the the BCUs into BL Lac and FSRQs. We also proposed to classify a BCU as a BL Lac object if it is classified in all the three cases, and as a possible BL Lac candidate if it is classified as a BL Lac in only two cases. For FSRQ candidate, we also take the similar considerations. Our classifications are compared with those by Kang et al. (2019). Our conclusions are as follows: 1. The γ-ray photon flux, spectral index, and variability index of FSRQs are higher than those of BL Lacs for the known blazar sample. There is a sequence from FSRQs to LBLs to HBLs being similar to that in Fossati et al. [35].
2. A positive correlation is found between γ-ray flux and the photon spectral index for the whole sample, but an anti-correlation is found for FSRQs and a positive correlation for BL Lacs. In addition, a positive correlation is found between variability index (log VI) and the γ-ray photon spectrum index(α ph ) for the whole sample, but an anti-correlation for FSRQs and a positive correlation for BL Lacs. We think those two positive correlations for the whole sample are apparent.
3. We adopted the SVM machine learning method to classify BL Lacs and FSRQs in the α ph v.s. log F, and α ph v.s. VI plots and log F v.s. VI. We obtained 932 BL Lac candidates and possible BL Lac candidates, and 585 FSRQ candidates and possible FSRQ candidates. 4. We compared our classifications with those in Kang et al. [34] and found that for the common sources, there is a goodness fit of 90.2% for BL Lac and possible BL Lac candidates, and a goodness of 87.8% for FSRQ and possible