Since the hadronic final state is common to both the analyses, it is useful here to introduce the concept of jets, which are referred to frequently in the following sections. After high-energy collisions in a particle collider, free quarks or gluons are created. Due to a particular property of QCD (the 
colour confinement), these cannot exist individually, rather, they interact with each other to form composite particles, called 
hadrons. In the 
hadronization process, all the particles created leave traces within the detector, resulting in conic agglomerates of tracks and energies deposits, called 
hadronic jets. Once data are analyzed offline, these hadronic jets are reconstructed with a dedicated algorithm [
8] with different cone opening angles, labeled by the parameter 
R(
, which is a measure of the angular distance between two vectors in cylindrical coordinates. The formula 
 indicates the angle in the plane x-y in the ATLAS coordinate system, transverse to the beam axis z; 
 is the pseudo-rapidity, defined as 
, and correlated to the angle 
 between the particle trajectory and the 
z axis. In the definition of a jet, it refers to the angular opening of the cone enclosing all the particles associated with the jet.). Jets with R=0.4 are called 
small-R jets; 
large-R jets are, alternatively, those with R = 1.0.
This section is divided into two parts, with the two applications described separately.
  3.1. YXH Fully-Hadronic Analysis
The first analysis described is a search for a Beyond Standard Model (BSM) TeV-scale narrow-width boson Y decaying to a Standard Model Higgs (H) and a new O(100s) GeV-scale boson X [
6]. The H is required to decay to 
, its largest branching ratio decay; explicit requests are not made for X, except that its decay products are jets. The full Large Hadron Collider Run 2 
 TeV 
 dataset is used, collected by the ATLAS detector from 2015 to 2018 and corresponding to an integrated luminosity of 139 fb
. A Feynman diagram for this process is shown in 
Figure 1.
Because H and X masses are of the order of thousands of GeV, they are created with a significant momentum component in the transverse plane with respect to the beam axis, labeled , that increases with the mass of the resonance. This collimates each boson’s decay products, which are then reconstructed together as a single jet with a large radius, from now on, identified by J. Therefore, the signature of such a signal is a resonant structure in the dijet invariant mass spectrum, where events can be selected based on a final state of two large-radius jets. A supplementary resolved selection is used to reconstruct the X mass with small-R jets (in this paper labeled with j) in the case of insufficient boost from the Y.
The primary 
background comprises SM multi-jet processes, where jets are produced from quantum chromodynamic [
9] (QCD) interactions, producing a smoothly falling invariant mass distribution on top of which a signal bump can be isolated.
This search is motivated by several key extensions to the Standard Model, which predict heavy diboson resonances, one of the simplest being a simplified model based on spin-1 heavy vector triplets (HVT) [
10], which reproduces a large class of BSM models.
The final fit is performed on the reconstructed invariant mass of the Y in overlapping windows of the X candidate mass to further enrich the signal-to-background ratio. The results are presented as limits on the cross-section times branching ratio of the generic HVT process.
SRs are constructed by selection based on the different properties of the H and X jets. An ambiguity resolution, required to determine which of the two J in the event is more likely to be the Higgs boson, is solved using a neural-net-based classifier, which separates bosons decaying to 
 from top quark and QCD jets [
11]. The outputs of the NN are three classification scores corresponding to the likelihood of the jet originating from a Higgs boson (
), top quark (
), or multijet process (
), which are subsequently combined into the jet-level discriminant 
, as shown in Equation (
6).
        
The jet with the largest value of  is labeled as the Higgs candidate (), and the other J is, by default, the X candidate (), therefore, determining which jet is subject to further H and X tagging.
A novel anomaly detection 
signal region is implemented based on a jet-level score for 
signal model-independent tagging of the boosted X [
12], representing the first application of fully unsupervised machine learning to an ATLAS analysis. The primary SR defined through a selection on the jet-level anomaly score (AS) of the X candidate is referred to as the 
anomaly SR. The remaining two SRs target the benchmark 
 decay and are, thus, referred to as 
two-prong SRs, which differ based on reconstruction of the 
X as either a single large-R jet (
merged SR) or two small-R jets (
resolved SR).
For all 
signal regions, a cut is applied to the Higgs boson candidate 
, along with a mass window requirement of 
. The further classification of events, where the Higgs boson mass candidate passes the 
 selection and has mass between 145 and 200 GeV defines the 
high-side band HSB1; HSB0 is in the same mass window, with the cut on 
 reverted. Validation is performed in the 
low-side band LSB, where the reconstructed Higgs boson mass is required to be between 65 and 75 GeV. LSB0 and LSB1 are similarly defined as having a Higgs boson candidate that fails or passes the 
 tagging criterion, respectively. CR0 is defined as the set of events in which the 
 is in the SR mass window but fails the 
 tagging. A scheme for the selection flow and the analysis region used for the 
background estimation is shown in 
Figure 2.
  Background Estimation
The overwhelming background to the  signal comprises high- multi-jet events. Such processes are known to produce mismodelings in Monte Carlo simulations, making simulation-based background estimation very challenging. Therefore, this analysis relies on a fully data-driven estimation of the background in the SR. The shape of the expected  distribution in the SR is obtained from data in the CR0 and weights are derived that can be applied to HSB0 to reproduce the shape found in HSB1.
The baseline for this method is the verified assumption of independence of the 
 cut efficiency from 
, in such a way that it is possible to define the reweighting function in a certain mass window and then apply it in another one. The reweighting function is defined as the ratio of the multi-dimensional probability distribution functions (PDFs) of the data in HSB1 to data in HSB0. In this analysis, the statistical procedure of direct importance estimation explained in 
Section 2 and 
Section 2.1 is utilized, where the ratio is estimated directly from data. It is implemented via the training of a DNN, where the loss function (in Equation (
5)) is minimized to produce weights that can accurately reproduce the observed ratio in data.
The DNN is built using a fully connected sequential model from Keras with three inner layers, each with 20 neurons and a rectified linear unit (ReLU) activation function. To reduce the problem of overfitting during training, 10% of the connections among inner layers are randomly truncated (“dropout"). The last layer has a single output with a simple linear activation function. The model is trained using the Adam optimizer in Keras with Tensorflow as the backend. Training is performed using a batch size equal to the full dataset size for 1600 epochs, with early stopping if the value of the loss calculated on the validation dataset does not decrease for 100 subsequent epochs.
Events are considered for training if they pass the analysis preselection, satisfy  and, additionally, have at least two track jets associated to the Higgs boson candidate. They are modeled as an unordered set of variables, namely the transverse momentum (), the pseudorapidity (), the azimuthal angle  and the energy of the Higgs boson candidate (E), the number of tracks associated to the Higgs boson candidate, the transverse momentum, ,  and the mass of the first two track jets associated with the Higgs boson candidate, ordered in .
Each variable x is standardized with the transformation , where  and  are its mean and standard deviation, respectively.
The training was performed in the HSB, using data both in the tagged (HSB1) and untagged regions (HSB0) before applying the SRs categorization. This inclusive training enables use of a single weights set for merged, resolved and anomaly SRs. The dataset was divided into training and test sets using 70% and 30% of the full training dataset, respectively. From the training set, 20% was used for validation, to validate the model and to monitor overfitting during the training phase.
The DNN outputs event-level weights, assumed to be approximately independent of , that can be applied to an untagged region to produce the  shape in the corresponding -tagged region. These weights are validated using data from the LSB.
Figure 3 shows the impact of the reweighting on the distributions of several key analysis variables, using the two-prong merged LSB as an example region.
 Three curves are shown in total, comparing the LSB0 data before and after DNN reweighting is applied to the target data distribution in LSB1. These variables are chosen to focus on kinematic variables over which the background estimation is extrapolated to generate the SR prediction. Shape differences are observed after the application of the weights, and good agreement is observed for the reweighted shapes to the true tagged data in all distributions, suggesting a robust background model. As the training is performed inclusively of the X-tagging, the same conclusion holds for the anomaly and two-prong resolved LSB regions.
This approach enables a normalized distribution to be obtained and the background normalization factor in SR is obtained by the fit procedure.
Several sources of uncertainties are related to the method, i.e., effects that are not considered in the statistical model, but can affect the result of the measurement. They are called systematic uncertainties; their impact is taken into account by quantifying the corresponding variation on the background shape. Three different kinds of such uncertainties were considered, as explained below.
The first is the potential variation in the obtained weights due to differences in phase space between HSB (where the network is trained) and the Higgs mass window. The related uncertainty is calculated by obtaining an additional background model, training the DNN in an alternate region of 165 GeV <  < 200 GeV. This region has approximately the same statistics and tagging efficiency as the nominal training region, helping to isolate the effect of the particular training region on the obtained output weights. Up and down variations are defined by symmetrizing the shape difference in  between the two different models, creating an effect of % across the distribution.
Another DNN variation is built to account for the finite statistics of the training sample and the random initialization of the weights. It is estimated with a bootstrap procedure [
13], where a set of 100 bootstrap networks are trained, each time varying the training dataset by resampling it with a replacement. The correct way to evaluate this systematic is to use the event-level covariance matrix between all bootstrap weights, but, since it is computationally prohibitive, the interquartile range (IQR) for each event’s weight distribution is considered as a good approximation of the uncertainty, along with the IQR of the normalization factor for each bootstrap training. Two additional templates are then defined with the median weight for each event, plus or minus half of the IQR, defining the upper and lower symmetric error bands. This corresponds to a 
% effect across 
.
Lastly, a non-closure uncertainty is included to cover modeling discrepancies that may arise from extrapolating weights derived from the NN training in the HSB to the LSB, and subsequently to the SR. It is defined by the symmetrized shape difference between the data and predicted background in the LSB. The non-closure is negligible for low  and rises to % in the  tails.
The results of background-only fits of the  distribution across all  categories in the anomaly SR show good compatibility of data to the expected background, after incorporating all statistical and systematic uncertainties. The largest deviation is in the  window [75.5, 95.5] GeV, corresponding to a global significance of 1.47. The results for the two-prong SRs show no significant deviations of data with respect to the predicted background beyond expected statistical fluctuations.
  3.2. Resonant HH in the  Final State
The second ATLAS analysis described in this paper, which adopted the same 
background estimation technique, is a search for resonant (in particle physics, the term 
resonant stands for processes having a peaked invariant mass distribution around the mass of a particle which has decayed into the observed final state) Higgs boson pair production in the 
 final state [
7]. LHC 
-collisions data collected by ATLAS in 2016–2018 are used, corresponding to 
.
To perform the search, a selection of the events used is performed, according to the expected properties of the signal, and signal and control regions are defined. The considered hadronic final state is explored by selecting only events with at least four small-R jets. Then kinematic cuts are applied to obtain SRs and CRs for the final fit on the 4b invariant mass distribution, summarized as follows: Events are first divided into two categories: 2b, where exactly two jets are b-tagged (a b-tagged jet is a jet identified as produced by the hadronization of a b quark) and 4b, where at least four jets are b-tagged. Exactly four jets are selected to construct the two H candidates. For 4b events, the four b-tagged jets with the highest  are selected. For 2b events, the two b-tagged jets and the two untagged jets with the highest transverse momentum are selected. The 2b events are needed to construct the background model for the 4b category. This selection of untagged jets can introduce a kinematic bias with respect to the 4b category; however, this is exactly what is accounted for by the reweighting function. After the four jets are chosen, they are paired to form the two Higgs candidates H and H and the pairing is chosen by a boosted decision tree (BDT). Pairs are then ordered in terms of their transverse momentum. The multi-jet background is reduced by requiring a certain angular separation between the two H candidates; processes coming from the top decay are suppressed by a proper applied veto. Finally, events are sorted into three kinematic regions based on the invariant masses of the H candidates: a signal region (SR), a validation region (VR) and a control region (CR).
  Background Estimation
After the selections described above, the background is mainly composed of pure QCD multi-jet processes; therefore, the discussed data-driven technique fits the background estimation problem well. Since the signal contamination in the 2b region is found to be negligible compared to the background uncertainties, these data are used to predict the background shape in the 4b SR. As in the previously discussed analysis, the HSB of the Higgs mass window is used for the training of the neural network; in this case, the CR plays the same role. Here a reweighting function is approximated by data to map the 2b kinematic region onto the 4b region. This function is then applied to the corresponding 2b SR in order to have a background model in the 4b SR.
The neural network used to minimize the loss in Equation (
5) is composed of three densely-connected hidden layers, with 50 nodes and a ReLU activation function each, and an output layer with a single neuron with a linear activation function. The variables used in the input to the NN for training performed in CR are chosen to be sensitive to the differences between the two kinematic regions. They include the following: 
 of both the 
-subleading jet and the fourth highest 
 jet; 
 between the first and the third 
-ordered jets; the average |
| of the four jets; 
 from the di-Higgs four-momentum; 
 between the Higgs’ candidates; 
 between the two jets forming each Higgs’ candidate; a constructed variable taking together the difference between the reconstructed and the nominal value of the Higgs mass and the number of jets in the event.
The effect of applying this reweighting to the CR, where it is derived, is shown in 
Figure 4. The output of this procedure is an estimate of the corrected 
 distribution in the 
4b SR, which is then used as input to the statistical procedure.
The number of events in the 4b region is calculated similarly to the YXH analysis. In the VR, a good compatibility between target and reweighted data is found and residual differences are used for the systematic uncertainty estimation.
In the background estimation procedure of this analysis, two sources of systematic uncertainties have been considered:
The training of the NN is subject to fluctuations due to the initial conditions and the limited size of the training sample. For this reason, the bootstrap resampling technique is used [
13], resulting in an ensemble of 
background estimates. In this ensemble, distributions are obtained after applying a weight to each event that is varied to the event-level weight IQR and then scaled to the same normalization as the nominal distribution; finally, they are multiplied by the ratio of the upper IQR value of the normalization factors to its nominal value. This new set of distributions creates an envelope centered around the nominal distribution, from which an estimate of the uncertainty in each 
 bin can be evaluated.
No significant excess has been observed in the data, which shows good agreement with the background prediction.