Kilonova-Targeting Lightcurve Classification for Wide Field Survey Telescope

With the enhancement of sensitivity of Gravitational Wave (GW) detectors and capabilities of large survey facilities, such as Vera Rubin Observatory Legacy Survey of Space and Time (LSST) and 2.5-m Wide Field Survey Telescope (WFST), we now have the potential to detect an increasing number of distant kilonova (KN). However, distinguishing KN from the plethora of detected transients in ongoing and future follow-up surveys presents a significant challenge. In this study, our objective is to establish an efficient classification mechanism tailored for the follow-up survey conducted by WFST, with a specific focus on identifying KN associated with GW. We employ a novel temporal convolutional neural network architecture, trained using simulated multi-band photometry lasting for 3 days by WFST, accompanied by contextual information, i.e. luminosity distance information by GW. By comparison of the choices of contextual information, we can reach 95\% precision, and 94\% recall for our best model. It also performs good validation on photometry data on AT2017gfo and AT2019npv. Furthermore, we investigate the ability of the model to distinguish KN in a GW follow-up survey. We conclude that there is over 80\% probability that we can capture true KN in selected 20 candidates among $\sim 250$ detected astrophysical transients that have passed real-bogus filter and cross-matching.


Introduction
The merger of binary neutron stars (BNS) and neutron star-black hole (NSBH) could be the source of thermal emission extended from near-infrared to ultraviolet, which is powered by r-process generated radioactive decay of heavy elements in the ejecta during the merger, often referred as kilonova (KN) [1][2][3].It is believed that kilonova are typically fainter than supernova and fast fading within a week [6,7].Often during this process, a highly relativistic jet along the polar axis could launch short γ−ray bursts (sGRB) lasting for a few seconds [8][9][10].Interaction of the jet and interstellar medium powers X-ray afterglow which is spread within a relatively wide viewing angle [11][12][13][14].The dawn of multimessenger astronomy associated with gravitational wave (GW), heralded by the detection of compact binary merger, GW170817, and its electromagnetic (EM) counterparts AT2017gfo, GRB170817A, has ushered in a new era of multimessenger astrophysics [2,3,[15][16][17][18][19][20][21][22][23][24][25][26].With the completion of the thereafter third LIGO/Virgo observing run, O3, and a total of 90 GW candidates identified, GW astrophysics has jumped into a time-domain era [4,5].In addition, the observation of EM signature has allowed for more accurate inclination and distance measurements of the host galaxy by model fitting and identification.In fact, EM counterparts of GWs are important sources of bright standard sirens to constrain the Hubble constant, e.g.H 0 = 74 +16 −8 km s −1 Mpc −1 for GW170817, which could shed light on the Hubble tension problem [27,28].
In the span of O3 run, a total of 56 public alerts were released by LIGO/Virgo through Gamma-ray Coordinate Network (GCN) Notices and Circulars1 [4].While extensive prompt follow-up observations were conducted following low latency public alerts which yield hundreds to thousands of candidates, no confirmed kilonvoa was identified [29,30].The reason for the undesirable outcome is controversial.The fast fading signature of KN and very limited sky coverage induced by poor localization might be responsible [31,32] However, poor sky coverage and selection criteria are also effective to data stream, which indicates that true KN could be rejected by real-bogus classification, astrophysical origin selection and even KN classification [30].
The 2.5-m WFST, installed at Saishiteng Mountain near Lenghu on Tibetan Plateau, China, will strongly support various science cases including time-domain astronomy, asteroids and the solar system, the Milky Way and its satellite dwarf galaxies, galaxy formation and cosmology and so on [41].With a field of view (FoV) of 6.55 deg 2 , it could cover ∼ 10 3 deg 2 within a night with 5σ depth of 22.31, 23.42, 22.95, 22.43, 21.50 mag under 30s exposure in five bands (u, g, r, i, z) respectively, making it one of the most powerful facilities in the northern sky for discovering GW counterparts [42].In addition, excellent observing conditions with an average seeing of 0.75 arcsec and 22.0 mag arcsec −2 V-band background provide potential for high quality data [43].
It is common to be overwhelmed by the data stream produced by the rapid and deep searching of wide field instruments.Since it is not sufficient to identify KN by photometry solely, efficient KN classification is still of great significance for maximizing identified KN.Traditionally, the KN photometry classification is based on several criteria, e.g.decay rate, color evolution [29] or model fitting and it is upgraded to a complete pipeline, e.g.ZTFReST [44].Another way is employing a machine learning classifier, which was implemented during O3 and is well designed so far [30].Stachie et al. [45] adapted the long-term lightcurve RAPID [46] classifier short-term KN detections.Chatterjee et al. [47] deployed a KN classifier, which uses similar structure, including GW skymap information.Biswas et al. [48] designed a fast transient classification algorithm, aiming to KN, which is implemented as a module in FINK broker 6 , a data stream processor software for ZTF and LSST.Sravan et al. [49] proposed a fully machine-directed pipeline for KN discovery including the optimizing survey plan to maximize the chance to identify KN.
Of great essence is the need for a rapid and efficient KN classifier for the WFST Targetof-Opportunity (ToO) observation.Thus in this work, we simulate lightcurve data from KN and contaminants detected by WFST with designed capability, air conditions and strategies.Subsequently, a modified RAPID framework is employed to train and test the performance on our simulated WFST ToO data generated using mock GW skymaps [35].
This paper is organized as follows.Section 2 describes the simulation of training data.We begin with the mock GW skymaps and survey plans that are generated automatically through the ToO pipeline.KN and contaminants are simulated using Monte Carlo sim- ulations which follow their space and time distribution, i.e.KN is distributed following GW and contaminants are distributed following their volumetric rate.Then we implement mock photometry and collect their lightcurves and associated information like location and line-of-sight probability.The classifier framework and training process are detailed in Section 3. In Section 4, we test the performance on both dataset, real data and explore the situation in simulated GW follow-up surveys.Finally, we present our conclusions in Section 5.

Transients Simulation
Our simulation has three steps: firstly we randomly choose mock GW events detected by LVK with O4 sensitivity.For each event, many KN and contaminant objects are placed across the sky.Then, we trigger our KN ToO pipeline to deploy mock surveys.Finally, we collect lightcurves given certain survey plans and contextual information for each object.The full dataset is obtained by iterating all select events.
Since the aim of this work is the KN classifier for WFST during O4, the amount of BNS and NSBH events detected so far is not adequate to cover the diversity of skymaps and survey cadences.Petrov et al. [35] simulated GW signals from BBH, BNS and NSBH merger and employed a more realistic threshold under O3, O4, O5 sensitivity.They gave comparable sky localization to O3 and concluded that the sky localization might be even worse during O4, which coincides with recent observations.Therefore, we randomly choose 250 GW events7 from BNS merger as our mock events that will happen at random in 2024, in which the WFST will be online.

Survey Simulation
In practical GW observations, an accurate localization of the event plays a pivotal role in triggering ToO observations.To initiate it effectively, it is essential that the observable area for the event aligns with the survey capability of the telescope, ensuring a sufficiently high probability of observing the specific sky region.In our study, we carefully exclude sections of the localization area falling within ±15 o of the galactic plane and regions with declination δ < −30 o for the mock GW skymaps.Additionally, we incorporate the restriction that the airmass should not exceed 2 to define the observable sky area for each event.Based on the observable area and the probability of observing the sky region, we filter 250 events, selecting those that meet the following criteria: an observable area is less than 3000 deg 2 and the probability of observing the 2D sky region is not less than 0.2.Among them, 68 events finally trigger ToO survey.
Then, we generate survey plans by implementing gwemopt8 , which was originally developed by Coughlin et al. [50], serving the purpose of optimizing the EM follow-up search for GW events.During O3, several post-event observation plans were formulated using gwemopt for both individual and joint observations by multiple telescopes [e.g., 29,51-54], yielding good performance.
In the process of generating survey plans, gwemopt includes several algorithms within each step: (1) Skymap tiling, (2) Time allocations, (3) Scheduling.Coughlin et al. [50] extensively discussed the efficiency of various combinations of these algorithms, determining that the combination of the (1) MOC algorithm, (2) power-law algorithm, and (3) greedy algorithm yielded the most efficient results.Therefore, we employ this combination of algorithms for our simulations.Specific inputs need to be prepared before running the code, namely merger event, exposure time, bands, and observation windows.Initially, the merger times for the GW events were randomly distributed from 2024.01.01 to 2024.12.31 to match the future operational timeline of O4.The selection of nightly coverage frequencies and bands was based on the observable area of the events and the lunar phase conditions.g, i− and r, i−bands were prioritized around the new moon and full moon, respectively.We also adjust our exposure time according to the luminosity distance of the event and estimated time spent in observation.Currently, we assume an observation window for three days post-merger, during which WFST would repeatedly cover the target area.Upon the completion of the code, it will generate a list containing the corresponding pointings for each exposure, the observation time, and the cumulative probability within the coverage area as output.We show two examples of GW skymap, MS_239 and MS_332, and corresponding triggered WFST ToO survey in Figure 1.The tiles in the map are footprints of one-night survey.In Figure 2, we show the cumulative density of elapsed time between follow-up trigger and GW trigger.The first observation time and overall observations are almost uniform within 0.8 days and 2 days respectively.The sky coverage is also widely distributed between 0.2 − 1, indicating that we have covered as many scenarios as we can.

KN Simulation
For KN simulation, we use two models to generate Spectral Energy Distribution (SED).The first one is a two-component model first presented by Bulla [28,55], in which the spectra of KN were calculated using Monte Carlo radioactive transfer code POSSIS9 .The first component, dynamical ejecta, is characterized by mass M dyn , which have velocities ranging from 0.08c ∼ 0.3c.There are two compositions that are formed via different channels, in which lanthanide-rich composition is distributed within angle ±Φ around the equatorial plane which is formed due to tidal processes and lanthanide-free composition otherwise that is formed by hydrodynamic interaction.The second component is post-merger wind ejecta, M pm , which is distributed spherically and has relatively lower velocities ranging from 0.025c ∼ 0.08c representing outflow from the accretion disk.To generate SEDs of arbitrary parameters, we construct a surrogate model using a Neural Network following the method in Ref. [56][57][58].The ejecta mass and velocity are related to the binary property involving mergers, such as chirp mass, mass ratio and equation of state (EoS).Many numerical relativity simulations have analyzed the process of merger and KN explosions [59][60][61][62][63][64].Therefore, it allows us to bridge a connection between BNS sample to a KN lightcurve sample.
For dynamical ejecta M dyn , we use fitting formula from Coughlin et al. [63] which was extended from Ref. [61,65], where M represents the exchanging of subscript.The mass of dynamical ejecta is sensitive to mass ratio q = M 1 /M 2 and compactness of neutron stars.We also characterize the fraction of the red component of ejecta presented in Ref. [66,67] that separate lanthanide-rich and -poor components by a threshold Y e ∼ 0.25 when q > 0.8, whereas for q < 0.8, f red ∼ 1 because the less massive NS is disrupted by tidal forces that suppress the shock [68][69][70][71].
The fraction of the red component can be written as with a = 14.8609, b = −28.6148,c = 13.9597.Using the spherical ejecta density profile assumption in the Bulla model, it is easy to obtain the half-opening angle for lanthanide-rich component.
For post-merger ejecta, we use the following expression to evaluate disk mass [63], with fitting parameters that include mass ratio dependence a = a 0 + δa • ξ, b = b 0 + δb • ξ and the free parameter ξ given by where q = M 1 /M 2 ≤ 1.The best-fit values of free parameters are a 0 = −1.581,δa = −2.439,b 0 = −0.538,δb = −0.406,c − 0.953, d = 0.0417, β = 3.910, qtrans = 0.900.M tot is the total mass of BNS and M thr represents the threshold mass for prompt massive neutron star collapse to black hole with the expression [28] M thr = 2.38 − 3.606 where M TOV is the maximum stable mass for non-rotating NS and R 1.6 represents the radii of 1.6M ⊙ NS.The disk wind ejecta mass is assumed to be proportional to disk mass M pm = f • M disk with f ranging from 0.1 ∼ 0.5 [72][73][74][75].In this work, we fix f = 0.2 as default set.Noted that explicit density, heating rate and opacity profile, and structured jet are considered in the new version of POSSIS 2.0 model [76,77].
We also employ another semi-analytical kilonova model presented in Nicholl et al. [67], thereafter MOSFiT KN, which is implemented by the open-source software MOSFiT10 .The model is based on the binary property forward to SED.They employed numerical relativity fitting formula to convert binary parameters to ejecta parameters.The ejecta contains three components with different grey opacity, namely blue (κ = 0.5 cm 2 g −1 ), red (κ = 10 cm 2 g −1 ), purple (κ = 0.5 cm 2 g −1 ) components.The red and blue components represent dynamical ejecta with higher velocity and the half opening angle of red ejecta is fixed to Φ = 45 o .This model also includes the cocoon emission and magnetic enhancement on blue ejecta mass.
Once the KN model is prepared, we can obtain the KN SED sample from a BNS sample and inject their parameters into two KN models.In the standard isolated binary formation channel, a recycled NS is born first and spins up due to the accretion or recycling process.It is accompanied by a non-recycled NS that spins down after birth.The mass of recycled NS follows a two-Gaussian distribution, with For nonrecycled NS, they are found to follow a uniform distribution within 1.15 ∼ 1.42 M ⊙ [78].
Given the mass of the neutron star, we calculate the radius and compactness by sampling the parameterized EoS obtained by Dietrich et al. [28], where the EoS sample was calibrated with constraints of pulsars and multimessenger observation on GW170817.We generate the KN SED sample with the size of ∼ 10 4 for each KN model.

Other Transients Simulation
To simulate contaminants, our main focus in this work is on supernovae, one of the most significant types of contaminants following catalog cross-matching.Still, we cannot ignore other important fast transients, e.g.Cataclysmic Variables (CVs), afterglows and shock breakouts.They will be included in the complete pipeline with the aid of other selection procedures.
SN Ia.The Type Ia Supernova is thought to be powered by the re-ignition and thermalnuclear explosion of carbon-oxygen white dwarfs once they exceed their critical mass [79,80].It is the most common and numerous contaminant object to KN.To model SN Ia, we use the SED library presented in Hsiao et al. [81].Apart from classic SN Ia, some subgroups in the SN Ia were identified through decays of observations.SN1991bg-like (SN Ia-91bg) stands out as one of the most important potential contaminants due to their bright luminous, rest-frame m B ≳ −18 and fast evolution feature, lightcurve width less than 70% average SN Ia [82,83].
SN Ibc.The stripped envelope supernova explosion, also referred as to Type Ib and Type Ic supernova, characterizes the feature of lacking helium in spectra [86].The lightcurves of SN Ibc are fainter, redder and evolve slower, indicating they are sub-dominant sources of contaminant.We use the SED template presented in Nugent et al. [85] 11 to model SN Ibc, SN Ia and its subgroups.SN II.Type II Supernova are explosions of massive stars with mass 8 ≲ M ≲ 18 M ⊙ [87].They are distinguished from other types by the presence of hydrogen in their spectra.We model two subgroups SN IIn and SN IIP using the SED template in Nugent et al. [85].
SLSN-I.Type I Superluminous Supernova is one of the brightest optical transients with peak absolute magnitudes ≲ −21 mag.They are widely distributed in metal-poor dwarf host galaxies, of which some are powered by magnetar with very strong magnetic field [88].Their spectra tend to be blue and lack hydrogen features and brighten rapidly, thus making them the main contaminant in early KN identification [89,90].To model SED, we employ the extended library of 960 SEDs of MOSFiT slsn model by Kessler et al. [84].

Training Set
Given GW skymaps and corresponding survey strategies and transient models described above, we perform survey simulation implemented by simsurvey12 [91], a software for survey simulation and lightcurve collection [29,92].The simulated KN for each GW event is sampled randomly based on two KN models in Sec 2.2.The explosion time of KN is set the same as the GW event and its location is produced following the probability density of GW skymap.For contaminants, we assume a uniform distribution of RA and Dec within the observed field of survey and specific redshift model listed in Tabel 1.
We randomly loop 60 skymaps and collect their lightcurves and object information, e.g.line-of-sight probability P l.o.s , distance mean and standard deviation {µ D , σ D } derived from GW skymap.Noted that for supernovas we truncate the distance with z < 1 which is comparable to detection depth of WFST.Finally, we obtain the data set containing 77,005 objects and details are listed in Table 1.Since we aim to distinguish true KN among many contaminants, we label Bulla KN and MOSFiT KN as Kilonova and the rest as Other.Figure 3 shows the normalized distribution of line-of-sight probability, mean and standard deviation

Binary Classification
We employ temporal convolutional network (TCN) architecture [93] which is implemented in keras 13 and RAPID14 for long-term lightcurve classification revealing good performance.We use a similar architecture to Chatterjee et al. [47] who took modification to RAPID for binary classification to filter KN.

TCN Framework
The strength and utility of the TCN architecture in the classification of time series lie in its ability to promptly update results as new data becomes available.As introduced by Bai et al. [93] in 2018, the TCN architecture operates on two fundamental principles: (1) The length of the output sequence matches that of the input, and (2) the internal convolutions are causal in nature.Provided that a sequence {x 1 , x 2 , • • • , x N } of data, the TCN outputs a sequence {y 1 , y 2 , • • • , y N }, and the convolutions within the hidden layers of the architecture are designed in such a way that each output, y n , solely relies on the information within the input sequence {x 1 , x 2 , • • • , x n }, where n ranges from 1 to N. Within this architectural framework, as shown in Figure 4, one can control the extent to which longterm historical information influences the output by making judicious selections of kernel sizes or incorporating dilated layers.Beyond that, one can inject contextual information as input, which stays unchanged with time.In this work, we consider line-of-sight probability, mean and standard deviation of luminosity distance of transient according to GW skymap and the combination of them.For simulated WFST surveys with time interval between two photometries ∼ 1 day, in order to reveal more precise shape of lightcurve, we adopt linear interpolation with time interval ∼ 0.5 day within 5 days.Therefore, the dimensions of input data matrix are N × (p + n), where N, p, n represents the length of interpolated time series, number of passbands and number of contextual information.

Training
For our purpose of classification for KN in ToO data, which lasts within 3 days postmerger as shown in Figure 2, we use a network of filter length k = 2 and dilation layers d = (1, 2) and 2 stacks to deepen the network.We employ categorical cross-entropy as the loss function in conjunction with the Adam optimizer.For dataset, we have partitioned it into a training set comprising 70% of the data and a test set containing the remaining 30%.By testing, we find that loss tends to converge after 50 epochs of training so that we train models for 50 epochs.Approximately 2-3 hours will be allocated for training and testing models.Notably, we find that the computational cost for training is comparatively modest, negating the necessity for hardware optimizations such as GPU acceleration.Figure 5 shows the accuracy per class with time in which at least 3 days of observation yields good accuracy.The classifier tends to predict a high KN score inaccurately in instances where photometry data is lacking, as evidenced by a noticeable decline around one day.However, as more photometry data is processed, the predictions converge more closely toward the actual label.
As described above, we have three contextual information at our disposal to assist in classification, line-of-sight Probability, thereafter P, Mean and standard Deviation of luminosity distance along the line of sight of transients according to GW, hereafter M and D. To see which information affects the result most and to decide the best combination of information used, we consider three combinations of them, MD, PM and PMD.Our fundamental principle relies on the mean luminosity distance defined by the GW skymap, serving as crucial guidance for the AB magnitude of transients.Therefore, MD model and PM model lack line-of-sight probability and measurement error of luminosity distance whereas PMD has full contextual information.

Performance on Dataset
Using the dataset simulated in Section 2 and the TCN framework described in Section 3 we obtain MD, PM and PMD classification models for WFST targeting KN.Among them, PMD model reaches an overall accuracy of 98.41%.To test the performance of a classifier, it is natural to employ a confusion matrix to show their capability.In our binary classification issue, the dimension of confusion matrix is 2 × 2 with true positives and true negatives residing in diagonal elements while false positives and false negatives lies in upper right and lower left element respectively.Figure 6 illustrates the confusion matrices for the MD, PM, and PMD models.The matrix values are normalized by the number of true labels and the KN score is the last KN probability in the sequence of prediction.In the context of prioritizing the detection of KN, a threshold of 0.45 is selected to minimize the potential for false positives.Specifically, we classify a candidate as a KN when the predicted KN probability in the final epoch surpasses this specified threshold.Upon comparing these models, it becomes evident that the MD model exhibits lower efficiency when contrasted with the other two alternatives.
However, it is obviously unreasonable to use the same threshold for three different models.We estimate the performance of models under various thresholds.In the low panel of Figure 6, we show the receiver operating characteristic curve (ROC) which illustrates the diagnostic ability of a binary classifier system.The inverted triangle curve and squared curve represent true positive rate and false positive rate with threshold ranging from 10 −3 ∼ 1 − 10 −3 .The threshold of 0.45 is shown as green dots in each curve.The more powerful the model is, the closer the curve is to the upper left of the coordinate.We summarize their precision and recall with thresholds corresponding to the cross of two curves, yielding 94.151% precision, 94.028% recall for PM model and 94.830% precision, 94.125% recall for PMD model.We notice that PM model and PMD model have comparable capability while lacking line-of-sight probability is the most intolerable case.

Performance on Real Data
To further investigate the performance of the classifier, we employ similar procedures in Chatterjee et al. [47] to test in unseen data.We consider Swope observations on AT2017gfo [94], originally identified as 'SSS17a' by the Swope team at the time of its discovery.It was the only identified KN accompanied by GW as far, with ejecta mass ∼ 0.05M ⊙ by the fitting of various KN models [20,26,28].We use g−, r−, i−band data to be compatible with our trained model input.Although the optical transmission curves of Swope are different from WFST, we maintain the belief that these disparities in the KN score's uncertainty will not lead to misleading classification results.Additionally, we also consider a counterpart candidate of GW190814 [95], AT2019npv, which was a SN Ibc but identified as KN in the early phase by efforts of several teams because it was located in GW skymap and the redshift was consistent with the distance information of GW [30,96,97].We include first week i−band observations from DECam as WFST survey would not exceed 5 days.
In Figure 7, we implement our PMD model to predict KN score of AT2017gfo and AT2019npv.We acquire comparable results to Chatterjee et al. [47], in which confidence KN score is obtained from AT2017gfo and persuasive primary KN score for AT2019npv at early phase than decisive contaminants result with the increasing observations.

Performance on Mock Survey
We have conducted thorough validation and performance tests of our model as described above.Nevertheless, it remains imperative to assess the model's efficiency in predicting future survey data, ensuring its practical applicability.To achieve that, we apply simulations with mock surveys described in Section 2. In a night survey, the WFST would detect O(100) − O(1000) transients ignoring history activity [41,98].We assume that approximately a fraction of them can be filtered out by cross-matching or angular offset to their host galaxy.Therefore, we only consider cases of processing less than 300 candidates by classifier.As a comparison, an average of 170 EM candidates were filtered after multi-step machine learning, which includes real-bogus test, cross matching and history exclusion [29].To simulate candidates processed by classifiers, which have been filtered by real-bogus test, cross matching and history exclusion, we simulate one KN according to GW skymap and n contaminants with uniform distribution across the sky for each mock survey.Then we compile them into a data package, labeled as detected candidates.The size of detected candidates is denoted as n DC .We generate 5 data packages for each skymap for data augmentation, considering the variety of KN models and location.In overall, we generate 300 data packages and we rank KN score in descending order in each data package, i.e. rank = 1 means the true KN has the highest KN score in detected candidates.According to our pipeline, we will take several selected candidates, the size of them is denoted as n SC , for the subsequent spectra follow-up.In our analysis, we choose n SC = {10, 15, 20} and n DC ∈ (20, 300).In Figure 8, we show the distribution of ranking of KN with n DC = {20, 30, 40, • • • , 290}.The top of grey and blue bars represent maximum and 0.8 quantile of KN rankings with various n DC , which means over 80% KN would rank within 20 when n DC do not exceed 290.And we also calculate 0.6 quantile which stays 1 with any n DC .For better clarity, we quantify the fraction of data package in which the true KN achieved rank ≤ n SC for given n DC as probability of inclusion.This metric reflects the classifier's effectiveness in incorporating the true KN among the selected candidates.As expected, the classifier's performance exhibited a declining trend with an increasing amount of n DC .Notably, we find a good fitting by a power-law function P(n SC , n DC ) = a • n −b DC with fitting parameters (a, b).The P(n SC , n DC ) curves are depicted in Figure 9.The results for the MD, PM, and PMD models are illustrated in violet, yellow, and green, respectively, and solid, dashed, and dotted lines corresponding to n SC = 10, 15, 20.The optimal fitted parameters are outlined in Table 2. Leveraging this fitted curve, we can now estimate the likelihood of capturing true KN in future ToO surveys, thus facilitating our choice of n SC .For example, given ∼ 250 candidates detected and 80% probability of inclusion required, we should characterize n SC ≳ 20.

Conclusion
In the era of multi-messenger astronomy, especially conducted by LVK and the future 3 rd generation Cosmic Explorer (CE; Reitze et al. [99]) and Einstein Telescope (ET; Maggiore et al. [100]), it will require strongly qualified detectors to be mutually compatible, which necessitates the coordination of numerous facilities to effectively leverage the scientific prospects offered by the upcoming dataset.In light of this, the development of machine learning techniques, integrating physically informed features, becomes crucial to streamline data and minimize the burden of human screening.The negative outcome during LIGO/Virgo O3 run indicates that the AT2017gfo-like KN is abnormal, presenting substantial difficulty in identifying KN systematically to date.And in the ∼ 6 months since LIGO/Virgo O4 has been running, many collaborations triggered their ToO observations, e.g.GROWTH 15 and MASTER GLOBAL Robotic Net 16 .To enhance the probability of identifying EM counterparts of GW detections, several brokers have been developed, i.e.KN classifier embedded in FINK broker [48], EI-CID which was based on RAPID framework [47], ZTFReST [44], AleRCE [101], Lasair [102], SCiMMA17 and a fully automated pipeline for KN discovery [49].
In this work, we have presented a KN binary classifier with a modified RAPID framework.Our approach was inspired by the enhancements detailed in Chatterjee et al. [47], where the fine-tuned neural network exhibited promising performance on simulated ZTF data.We start with the simulation of transients where we have conducted mock WFST survey to 60 simulated GW skymaps with O4 sensitivity.The KN are located following the probability distribution of GW skymaps whereas contaminants are located uniformly across the sky.Noted that Andreoni et al. [44] found cosmological afterglow the dominant contaminants at high Galactic latitude within ∼ 1 yr observations.Furthermore, other contaminant class, e.g.CVs, afterglows and shock breakouts, poses potential challenges because only a fraction of them can be filtered by cross-matching with catalog, implying it would be dominant sources sometime [48].Considering that the classifier is designed to operate on processed data ideally filtered through a real-bogus test and the exclusion of variable stars, we have simulated the majority of supernovae, which represent the predominant contaminants following the removal of these sources.Upon these foundations, we have applied three combinations of contextual information, MD, PM and PMD, revealing comparable performances between the PM and PMD models, while the MD model proves to yield less promising results as shown by the confusion matrix and ROC curve in Figure 6 .The PMD model shows comparable accuracy as evident in Figure 5 in which accuracy for Other reaches ≳ 98% and Kilonova reaches ≳ 92% after 3 days since trigger.Beyond that, we also validated models by predicting KN score for Swope observations on AT2017gfo and DECam observations on AT2019npv, where the lightcurves and predicted KN scores are shown in Figure 7.
Furthermore, we simulated true KN accompanied by a quantity of contaminants for each mock survey to examine the performance on ToO survey data.By sorting KN scores in descending order, the distribution of rankings of true KN are plotted in Figure 8, indicating that ∼ 80% probability KN ranking are ≲ 20 when n DC less than 290.In addition, we found a robust fitting to probability of inclusion with n SC , n DC as shown in Figure 9 which will instruct the choice of n SC in the future.
Through the analysis of GW observable, we found a great discrepancy between lineof-sight probability, mean and standard deviation of luminosity distance according to GW.And we did not include offset in cross-matching and A 90 of GW skymap as in Chatterjee et al. [47] due to the uncertainty in identifying host galaxy and numerical relativity simulations.Alternatively, we explored various KN models and neutron stars, leveraging a comprehensive sampling approach across a BNS sample, in conjunction with the Bulla and MOSFiT models.We anticipate retraining our models with the inclusion of data from the forthcoming WFST survey, specifically incorporating observations of real supernovae, and expanding the scope of the models to accommodate phenomena like GRB afterglows and stellar flares.Furthermore, we envision fostering collaboration among diverse observational bands, i.e. preliminary X-ray emission detectors like Einstein Probe (EP; Yuan et al. [103]) or Chandra X-ray Observatory(CXO; Weisskopf et al. [104]) , to glean additional SED features beyond the optical band.

Figure 1 .
Figure 1.Examples of mock GW skymap and footprints of WFST ToO observation.

Figure 2 .
Figure 2. Left panel: Cumulative density of time elapse of follow-up first and all observation.Right panel: Cumulative density of sky coverage of triggered events by WFST.

Figure 3 .
Figure 3. Distribution of line-of-sight probability, distance mean and standard deviation of the training set.

Figure 4 .
Figure 4. Framework of TCN with length of filter is 2 and dilation is (1, 2, 4) in hidden layers.

Figure 5 .
Figure 5. Accuracy per class with time of PMD model.Kilonova and Other reach over 98% and 92% accuracy after 3 days since trigger respectively, after the inaccurate prediction with high KN score when lacking photometry data.

Figure 6 .
Figure 6.Upper panel: Confusion metrics for MD, PM and PMD model.The value in elements are normalized by the size of true labels per class.Lower panel: ROC curves with KN and Other are drawn as inverted triangles and squared lines.The colormap of marker denotes the various values of threshold and the default value of 0.45 is plotted as green dots.

Figure 7 .
Figure 7. Left panel: Prediction of Swope observations on At2017gfo.Right panel: Prediction of DECam observations on AT2019npv.

Figure 8 .
Figure 8. Distribution of ranking of true KN in data package with various n DC .The top of grey and blue bars represent maximum and 0.8 quantile of KN rankings with various n DC .

Figure 9 .
Figure 9. Best-fit probability of inclusion with n DC for various models.The PM model could reach over 80% probability of inclusion when n DC ≲ 250 .

Table 1 .
Summary of transients in the training set. of transient according to GW skymap of our dataset per class.One can see that the KN has more tight distribution of features compared with contaminants because they always happen in the high probability area of GW skymap.Besides the lightcurve, these features could also assist us in classifying KN and contaminants.

Table 2 .
Best fitting parameters (a, b) for various n SC of three models.