Multimodal analysis of Gravitational Wave signals and Gamma-Ray Bursts from binary neutron star mergers

A major boost in the understanding of the universe was given by the revelation of the first coalescence event of two neutron stars (GW170817) and the observation of the same event across the entire electromagnetic spectrum. With 3rd Generation gravitational wave detectors and the new astronomical facilities, we expect many multi messenger events of the same type. We anticipate the need to analyse the data provided to us by such events, to fulfill the requirements of real-time analysis, but also in order to decipher the event in its entirety through the information emitted in the different messengers using Machine Learning. We propose a change in the paradigm in the way we will do multi-messenger astronomy, using simultaneously the complete information generated by violent phenomena in the Universe. What we propose is the application of a multimodal machine learning approach to characterize these events.


Introduction
The detection of gravitational waves (GWs) from the inspiral phase and coalescence of a pair of neutron stars (NS) on August 17th 2017 [1] and the following observations of the event in its electromagnetic (EM) counterparts (see [2] and references therein) marked the beginning of multi-messenger astronomy with GWs. For the first time, we observed the coalescence of two NSs through GWs and EM radiation across the entire electromagnetic spectrum, thanks to the participation of more than 70 astronomical observatories to the EM follow-up campaign. Multi-messenger astronomy opens up new scenarios for the observation of the universe, new perspectives for the investigations of astronomical objects, but also new challenges for the way of extracting every information that these astrophysical events bring with them. The synergy between the information that only GWs can provide and the concomitant observations through other detectors of the EM and neutrino counterparts can strongly accelerate our knowledge of the Universe. It is clear that multi-messenger astronomy discloses the need of new paradigms for data analysis and introduces new challenges for real-time analysis, and there are many efforts ongoing to face with them, which involve the use of machine learning techniques (see, e.g., [3][4][5][6][7][8][9][10][11][12]). Multimodal machine learning (MMML) analysis is efficiently applied in many fields of data analysis for the more inclusive interpretation of events where several modalities are concurrent, such as in a video with audio, or images with caption, or images, text and sound [13]. To our knowledge, these techniques have never been applied in the interpretation of astrophysical data, where signals of different nature can be almost simultaneous. In this paper we introduce, for the first time, a multimodal machine learning analysis applied to astrophysical transient signals such as the case of GWs and Gamma-Ray Bursts (GRBs).
In Section 2 we will describe the importance of multi-messenger astronomy; in section 3 we will introduce the multimodal analysis and some example of application; in section 4 we will describe how we can implement the multimodal analysis for multi-messenger events and in 5 we will report a proof of concept for the the application of MMML to GW and GRB data. In the conclusion we will discuss about other possible applications.

Multi-messenger observations as a powerful tool to investigate the extreme universe
The joint observation of GW170817 and its EM counterparts clearly demonstrated the enormous informative power of multi-messenger astronomy with GWs. For instance, the joint detection of GW170817 and GRB 170817A represents the first direct proof that NS-NS mergers are progenitors of short GRBs [14]. In addition to this, the observed time delay between the GW and the gamma-ray signal (∼ 1.7 s) allowed us to put constraints on the difference between the speed of gravity and the speed of light, that has been estimated to be between -3×10 −15 and +7 × 10 −16 times the speed of light [14]. After the joint detection of GW170817 and GRB 170817A, the release of a three-detector, well constrained GW skymap has been key for the identification of other EM counterparts [15]: this allowed us to get more insights into the physics of the source. For instance, the multi-wavelenght EM observations associated with GW170817 allowed us to infer some basic properties of short GRB jets. The temporal evolution of the X-ray [16] and radio [17] light curves, together with the very low gamma-ray luminosity of GRB 170817A, suggested two possible scenarios: an off-axis GRB with a relativistic, structured jet or a "cocoon" emission from the relativistic jet shocking its surrounding non-relativistic material [14]. Subsequent very long baseline interferometry observations have been crucial to discriminate between these two scenarios: they allowed astrophysicists to put constraints on the size of the source and on its displacement, that were found to be consistent with a structured, relativistic jet [18,19].
The detection of the optical/NIR counterpart to GW170817 (AT2017gf), first reported by [20] and later by other teams (see [2] and references therein) allowed for the first time to identify the host galaxy of a GW event and led to the first spectroscopic identification of a kilonova [21,22], thus expanding our knowledge of heavy elements nucleosynthesis in the Universe. The joint GW and kilonova observation also allowed us to investigate in more detail the neutron star equation of state (EOS). For instance, [23] found lower bound on the tidal deformability parameter through the interpretation of the UV/optical/IR counterpart of GW170817 with kilonova models, combined with new numerical relativity results; by combining this result with the constraints obtained with GW data alone, they shown that both extremely stiff and soft EOS are tentatively ruled out. [24] presented a Bayesian parameter estimation combining information from GW170817, GRB170817A and AT2017gfo, and with this analysis they were able to obtain multi-messenger constraints on the EOS and on the binary properties. More recently, [25] performed a joint analysis of GW170817, GRB170817A and AT2017gfo, together with GW190425, and they combined these with previous measurements of pulsars using X-ray and radio observations, as well as and nuclear-theory computations, to put constraints on the EOS. A deeper knowledge of the EOS is fundamental also to understand which is the outcome of coalescing binary systems and therefore to constrain the short GRB central engine (see, e.g., [26]).
Finally, the estimate of the luminosity distance with GWs, together with the estimate of the redshift obtained from the host galaxy, allowed us to estimate the Hubble constant with a totally new approach, independent from previous measurements [27]. Joint observations of GW and EM are thus a remarkable instrument for unveiling the physics of some of the most extreme phenomena in the Universe.

Artificial intelligence via multimodal inputs
Multimodal Machine Learning (MML) is a multidisciplinary research area that addresses some of the main objectives of Artificial Intelligence (AI) by incorporating and creating models that can process and link information from multiple modal inputs, differing in the representation (text, images etc.), dimensions (1-D, 2-D etc.) as well as input data sources. By considering data from multiple modalities, it is possible to take into account the complementary information among them, which in turn leads to more robust predictions that reflect patterns not available when working with individual modalities.
The multimodal approach is already used in a wide variety of artificial intelligence problems, such as use for virtual assistance, image captioning, question answering, etc. One must consider the advantages in obtaining multiple input data characterizing the same event, as well as features extracted in different domain decompositions. In this way we have the ability to capture otherwise unidentifiable details [13]. In figure 1 we reported a schematic view of the general idea underlying the multimodal analysis. The input samples contains different kind of signals and representations which you can encode in a Deep Network analysis, concatenate and the later stage use for your classification/regression analysis. Machine learning (ML) and Deep Learning (DL) techniques, which have changed the way data is processed in recent years, have already been implemented into the Gravitational Wave Astronomy community [28], thanks to the computational resources available to us in recent years and the implementation of algorithms that allow effective use of graphical processing units (GPUs). Recently, DL had been successfully applied even to multimodal machine learning problems, with the aim of learning useful joint representations in data fusion applications [29]. In [30] we introduced a first approach to MML, based on the merging of deep learning pipeline outputs on 2 different kind of inputs: time series and images in the scheme which is called late fusion multimodal [31].
The challenge will be to apply these techniques also to data from different instruments with different outputs that characterize a multi-messenger event. We want to show how MML could be used to process 1-D strain as well as 2-D spectrograms from GW detectors with sparse light-curves collected by astronomical telescopes in order to infer astrophysical information from the common sources. At the same time, even the ability to caption GW data with associated GRB event could help in fastly identify source parameters.

From multi-messenger observations to multimodal analysis
In the next years, second generation GW interferometers (Advanced LIGO [32], Advanced Virgo [33] and KAGRA [34,35]) will take data with increased sensitivity, and third generation GW detectors (such as the Einstein Telescope [36]) will become operative; furthermore, many new telescopes will start taking data (e.g., CTA [37], LSST [38]): we expect therefore an increase in the data rates and in the data complexity. In order to maximize the scientific return of future multi-messenger observations, there is the need to develop new approaches to analyse large streams of EM, GW and neutrino data taking into account the differences in instrument sensitivities, spatial and temporal coverage, data formats etc; furthermore, novel tools are needed to combine in an efficient way informations from multi-messenger observations, allowing us to infer the properties of the astrophysical sources and their environment. For instance, joint GW and EM observations can be used to put more stringent constraints on the tidal deformability parameter (Λ) and therefore on the EOS of NS (see, e.g., [23,39]; see also Section 2). To do this, we need to perform a GW parameter estimation with an accurate family of template waveforms, as well as a detailed comparison between the EM observations and the existing kilonova theoretical models; informations obtained from the two messengers should then be combined in a consistent way to put constraints on the EOS. Such constraints, together with the estimates of the masses of the two NSs as obtained with GWs, could also help us to get insight into the outcome of the coalescence (that can be a NS or a BH) [40]. In case of a coincident short GRB observation, the additional characterization of the X-ray afterglow emission will allow us to eventually probe the GRB-magnetar model, based on which magnetars are the GRB central engine; such model has been revealed itself very successful in reproducing the observed properties for the sub-class of short GRBs showing an X-ray plateau [41] and/or an extended emission [42,43], but a direct proof of this connection is still missing.
In this work we propose a new paradigm for analyzing the multi-messenger data we will collect with next generation instruments.
In figure 2 we sketched an example of multimodal analysis for multi-messenger events. We can approach the information of the diverse messengers through a dedicated pipeline, representing them in the best format for features extraction. We can analyze these representations using the best suited machine learning workflow to maximize the capability of prediction and at a final stage combine the output of extracted features in a multimodal model. We are depicting here a future vision, where in an open access environment we can analyze shared data using shared software on cloud systems [44]. An astrophysical phenomenon such as core-collapse supernova (CCSN), NS-NS or BH-NS coalescence (multi-messenger events) can manifest itself through different signal types, such as: gravitational waves, gamma-rays, X-rays, optical and radio emission, neutrinos. The different modalities have their own representations in different domains. By using DL and ML models, we can use the extracted features to do model prediction at a first stage. At a later stage, we can use the predicted features by combining them in the global MML model.

Application to astrophysical sources: the case of binaries of compact objects
We decided to test our idea on a set of simulated short GRB light curves and associated GW events, with focus on NS-NS mergers. Specifically, we divided this task in three steps. 1) We simulated a sample of NS-NS merging systems populating the universe volume that can be explored with next generation GW interferometers. Specifically, we assigned to each NS-NS system a luminosity distance that is randomly extracted from a uniform distribution in the range between 1 to 500 Mpc, to cover a realistic range of the matched filter Signal-to-Noise Ratio (SNR) -varying from 4 to 20. We then assumed that both components of the binary systems have a mass distribution equal to a uniform distribution between 1 and 2.5 M ; the two distributions were assumed to be uncorrelated. The inclination angle (θ i ) of the systems was chosen taking into account that GRB emission is collimated and these sources are typically detected when they are on-axis, i.e. with the jet pointing towards the observer. According to the few estimates currently available, the jet opening angle θ j ranges from 3 • to 8 • (see, e.g., [45] and references therein); we therefore restricted the range of possible values of θ i to take this observation into account. 2) Following the approach presented in [46], we assumed that all the NS-NS mergers are associated with a short GRB and we simulated the associated high-energy afterglow light curves that could be observed by the LAT instrument onboard the Fermi satellite [47], using GRB 090510 [48] as a template. 3) We simulated the GW signal associated with the NS-NS mergers using the TaylorF2 waveform model [49]. We simulated the noise data for a GW detector such as the Einstein Telescope [50], where we injected the NS-NS merger GW signals. The simulations were performed using the pyCBC library [51]. We propose a MML pipeline consisting of two Convolutional Neural Networks (CNN) concatenated at the ouput in order to estimate the redshift of the GRB and GW sources. To convert luminosity distances in redshifts, we use the cosmological parameters in [52]. A 2-D CNN takes the time-frequency image representation of the GW signal as input. The images are built from the detector strain time series containing the injected NS-NS GW signals. As a first step, the strain is whitened in time domain by means of an Auto-Regressive (AR) model [53] to remove the stationary noise component. The 60 seconds long segments containing the chirp signals are then converted into a time-frequency representation based on the continuous wavelet transform, using Morlet wavelets (using ssqueezepy library 1 ), see [54][55][56]. Finally the images are compressed to dimensions in pixel of 128 × 256. For GRB lightcurves, we used the data in time domain. The length of GRB simulated data was kept up to 1000 points, where most of the information was encoded 2 . Examples of the chosen representations are shown in Fig. 3: on the left side the image for GW signal, on the right side the time domain light curve data.  The CNN processing GW data consists of 5 layers with the following number of filters: 64, 32, 16, 16, 32 and kernels of dimensions (3,3). After every convolutional layer, we applied maxpooling with kernels (2, 2). The 1-D CNN processing light-curves consists of 3 layers with the following number of filters: 80, 40, 40 and kernels: 5, 3, 3. Also in this case after every convolutional layer we applied maxpooling with kernel 2. After the last layers, both CNN were flattened and concatenated as one. The concatenated output is fed to a fully connected layer which outputs the prediction for the source redshift. All layers have a ReLU activation function, with the exception of the final layer with a linear activation. The MML model was trained on the train data set using the Mean Squared Error (MSE) loss function and Adam optimizer with learning rate of α = 0.001 and a learning rate decay of 0.066667. The model is summarised in Fig. 4. For training and evaluation, the dataset was divided with the following scheme: 70% training set, 20% test set, 10% validation set. Training was carried out with a batch size of 16 samples for 100 epochs. The total number of trainable parameters is 61881. As a pre-processing step, we applied minmax scaling to the inputs. Shuffling is also applied.
In Fig. 5 we report the mean squared error loss evolution over the training epochs for both the train and validation sets. It can be observed that the MSE converged well over the 100 epochs algorithm, and that there is no overfitting. In Fig. 6 we show the comparison between Figure 6. Predicted redshifts versus the true redshifts of the simulated sources (blue points); the red solid line represents the expectations in case that the predicted redshifts exactly match the true redshifts. the predicted redshifts and the true redshifts of the simulated sources on the test data set. It can be seen that, for the lowest values of redshifts, the predictions are in good agreement with the real values, while for the highest redshifts the scatter around the line y = x increases and the predicted values are typically lower than the true ones. This is partially due to the fact that, for the most distant sources, the GW signal has low SNR value. Also, we used a limited dataset, and this could have affected our results. However, with this work we only wanted to investigate the feasibility of multi-modal ML analysis; a more detailed study, including a larger simulated dataset and possibly real EM and GW data, will be presented elsewhere.
In Fig. 7 we show the histogram of the relative difference between the predicted redshifts and the true redshifts . It can be seen that the histogram has a peak on zero, meaning that for the majority of the simulated sources the algorithm allowed to correctly estimate the redshift.

Outlook and Perspective
Our team's innovative Multimodal Machine Learning based approach consists in analyzing each multi-messenger event simultaneously in its entirety, through the information we can gather from GWs or electromagnetic or neutrino emissions, which will be the inputs for a single MMML pipeline. This allows us to increase the information we can extract, thanks to the concomitant analysis of every piece of information. Just as it is easier for us to understand language when we associate sound to video, or to understand people's mood by analysing text and images, we will apply multimodal analysis techniques to astrophysical transient events.
This is a new paradigm for analysing the data we will collect in future astro and particle experiments. We tested the idea on a data set we created simulating both GW and GRB emission, just to setup a proof of concept. We built a MML pipeline taking as input the images obtained through a time frequency representation of whitened GW data for Einstein Telescope detector, and the simulated GRB light curves for a Fermi-like detector. We want to underline that the work here presented is just an example of a basic application of multimodal analysis for multi-messenger events, based only on GW strain data and gamma-ray GRB light curves, with a simple neural network architecture. The results even if preliminary are very encouraging and we plan to continue with our approach on real data or including other input messengers. We will implement a more sophisticated and comprehensive analysis, by adding data in different formats, at different levels (from the raw data to the high-level data) and related also to other messengers (e.g., neutrinos and photons at other wavelengths). For instance, we want to include into our framework raw data from Imaging Atmospheric Cherenkov Telescopes (IACTs) such as CTA: 2-D images that represent the tracks left in the telescope's camera by showers of particles that can be induced, for example, by very-highenergy (VHE, E > 100 GeV) gamma rays emitted by GRBs. At the same time we plan to optimize the analysis workflows using feature extraction and engineering, data balance and augmentation for less represented class of events and working with deeper networks. It is worth to emphasize that this approach we are proposing represents a new challenge for data scientists and astrophysicists, even in anticipation of the larger number of events we will be able to detect with third-generation GW detectors.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.