Detector Characterization and Mitigation of Noise in Ground-Based Gravitational-Wave Interferometers

: Since the early stages of operation of ground-based gravitational-wave interferometers, careful monitoring of these detectors has been an important component of their successful operation and observations. Characterization of gravitational-wave detectors blends computational and instrumental methods of investigating the detector performance. These efforts focus both on identifying ways to improve detector sensitivity for future observations and understand the non-idealized features in data that has already been recorded. Alongside a focus on the detectors themselves, detector characterization includes careful studies of how astrophysical analyses are affected by different data quality issues. This article presents an overview of the multifaceted aspects of the characterization of interferometric gravitational-wave detectors, including investigations of instrumental performance, characterization of interferometer data quality, and the identiﬁcation and mitigation of data quality issues that impact analysis of gravitational-wave events. Looking forward, we discuss efforts to adapt current detector characterization methods to meet the changing needs of gravitational-wave astronomy.


Introduction
Within the past decade, advances in the design of gravitational-wave interferometers have allowed the Advanced LIGO and Advanced Virgo detectors to reach the required sensitivity to detect gravitational waves from astrophysical sources [1,2]. These observations [3][4][5][6][7] of the cataclysmic mergers of massive compact objects allowed for the discovery of a new population of black holes [8], insights into the nuclear equation of state [9], precision tests of general relativity [10], and the identification of compact objects that are not predicted to form from stellar evolution [11,12].
These historic discoveries in astrophysics have required measurements of gravitationalwave strains of less than 10 −21 [3]. To achieve these levels of sensitivity, gravitational-wave detectors rely on a wide array of technologies designed to minimize the sensitivity of the detectors to non-astrophysical sources of noise. However, the complexity of the detectors also provides numerous pathways for introducing instrumental noise into the recorded data. Addressing the presence of noise in the instruments is a crucial aspect of gravitationalwave astrophysics, both for the continual improvement of the detectors themselves and for ensuring appropriate treatment of the noise in the data analyses.
This area of research, focusing on diagnosing and mitigating instrumental problems that impact the quality of recorded gravitational-wave strain data, is known as "detector characterization". Detector characterization was a central element of the initial operation of both first [13][14][15] and second [16] generation detectors and the first detection of gravitational waves [17]. Figure 1 illustrates the central role of characterizing the detector and its data quality to the flow of LIGO data analysis from the instruments to the final output Figure 1. A visualization of the connections between different components of gravitational-wave data analysis. Detector characterization is in the middle, indicating its critical role in gravitational-wave astronomy, as it contributes to many different steps of the process, from improving the detectors to checking candidate gravitational-wave signals for any possible pollution by nose artifacts. Figure reproduced from [24]. © IOP Publishing. Reproduced with permission. All rights reserved.

Characterizing Gravitational-Wave Detector Instrumentation
One central goal of detector characterization is finding sources of noise within the instruments and working with on-site staff to mitigate these noise sources to improve the detectors' performance. These investigations may be targeted on improving specific subsystems of the detector or diagnosing particular kinds of noise. This section presents an overview of instrumental investigations for the purposes of making concrete improvements to the detectors, including the detailed analysis of individual detector components and methods for investigating transient and persistent noise in the detectors.

Subsystem Characterization
Characterization of the instruments for the purpose of improving instrumental performance is carried out at various levels of specificity. To be able to truly characterize the detector as a whole, first individual components have to be well understood, and then the connections between individual subsystems and how they connect to the final output of the detector. Components of each detector are classified into multiple "subsystems" that each focus on a specific element of the overall detector design. The instrumentalists, commissioners, and engineers working on site are faced with the monumental endeavor of finely tuning each subsystem and getting the many different components of the detector to work together in the precise harmony needed to detect gravitational waves. Detector characterization of the individual components of the detector is an integral part of the process, even before the full detector has been installed. Analyzing the data of specific subsystems or combinations of subsystems prior to observing runs is necessary for diagnosing problems and improving the instruments.
For each subsystem, it is important to know its key design features, the important data channels used to monitor its performance, and to develop methods of distinguishing abnormal and nominal behavior. Detailed studies of each subsystem have continued from the earliest engineering runs that were used to characterize the detector and continue to be an important part of the process of identifying issues in the detector during observation runs to detect gravitational waves. This includes having a deep understanding of the physical environment surrounding each instrument [25], the seismic isolation system [26], the suspension systems [27,28], the pre-stabilized laser, individual optical cavities, calibration, thermal compensation system, and the many feedback and control loops required to operate the detectors. At such a detailed level of characterizing the detector, there can be impacts not only on diagnosing specific issues at an early stage of commissioning the detectors, but also on the design of future detectors. For example, the highly sophisticated seismic isolation systems of Advanced LIGO were motivated by an understanding of the significant impact of seismic noise in initial LIGO [29].

Diagnosing and Mitigating Transient Noise
Characterization of transient noise in the detectors takes into account the overall detector output as well as the behavior of individual subsystems. Several existing papers describe examples of noise investigations in detail [16,18,[30][31][32]. Given the complexity of the detectors and the wide variety of potential noise sources, there is no single way to diagnose all types of instrumental noise. However, these studies tend to follow a similar process, which we describe here. A general overview of this process is shown in Figure 2. These investigations can be focused on tests completed at the detector site or remotely using recorded data and include identification, characterization, and mitigation of instrumental problems. There is generally active collaboration between those at the sites investigating the detectors and those analyzing the detector data, such as diagnostic tests completed by those on site being evaluated by people off-site with additional data analysis tools.
The initial identification of a problem typically begins with analysis of the output of the detector and gathering basic observations about the patterns in the noise, which often gives some initial clues when combined with knowledge of the detector and its environment. This initial analysis is frequently done with the aid of one or more of the many software tools that are run automatically on a daily basis, whose results are displayed on summary web pages internal to LIGO and Virgo and inspected regularly [33,34]. Several of the summary pages show key figures of merit that describe the overall sensitivity and data quality of the strain channel, while numerous additional pages include detailed figures that can be used to trace noise through various subsystems.

Figure 2.
A summary of the general approach to investigating problems from a detector characterization perspective. The two rows show the two main areas of focus in the investigation, either the detector or the data. Rows represent steps in the investigation, while boxes are individual methods. The typical investigation process proceeds from left to right.
One indication of a new problem in the detector is an increase in the rate of "glitches" (short-duration bursts of noise) in the strain data at a particular time or frequency. "Event trigger generators" search for glitches in a time series. One of the most commonly used event trigger generators for detector characterization is Omicron, which is run automatically on a daily basis for the strain channel as well as thousands of auxiliary channels [35,36]. Omicron uses a wavelet basis to transform the time series into a time-frequency representation. Omicron records a variety of metrics for each glitch that it finds, including the peak time, frequency, and signal-to-noise ratio (SNR). Additional event trigger generators are DMT Omega and KleineWelle [37]. Problematic populations of glitches may also be identified by characterizing outliers in the gravitational-wave search backgrounds [17,38] or single-detector GW searches [39,40].
Identified patterns of the noise, such as glitches occurring in a particular frequency range or only during certain times of the day, can already provide useful information about the noise. Combining this knowledge with prior knowledge of the detector based on subsystem characterization or previous instrumental investigations can offer some clues as to the source of the noise. For example, low frequency glitches occurring particularly during the weekdays could relate to elevated seismic noise from human activity.
In addition to the characteristic frequency and SNR of glitches, it is also useful to know the frequency morphology over time. The "Qscan" method [41,42] is based on a similar algorithm to Omicron, and uses overlapping time-frequency tiles of varying widths to create a high resolution spectrogram that shows how the glitch's frequency and amplitude evolve over a short time scale. Based on these images, different classes of glitches have quite distinct shapes, as illustrated in Figure 3.
Since the early science runs of LIGO and Virgo, the detector characterization group began to develop an understanding of particular glitch classes [14,43,44]. Throughout the observing runs, detailed investigations of specific types of glitches (such as "whistle" glitches [16], "blips" [45], and "scattered light glitches" [46]) have continued to be an important facet of detector characterization. These investigations frequently begin with noticing the impact on data quality in the searches, rather than at the specific level of detail of characterizing a particular instrumental monitor. In the noise investigation process, these glitches might be classified visually, but there are now also glitch classification tools that were developed to aid in creating categories of glitches. Gravity Spy [47][48][49] is a citizen science and machine learning project that offers categorization of glitches based on their appearance in the Qscans. The Gravity Spy glitch collections have been helpful in noise investigations, for example in a study of scattered light glitches that led to improvements in the detector in the third observing run [46].  The glitches' distinct visual appearance in the scans enables them to be categorized visually or through machine learning algorithms. Figure reproduced from [18]. © IOP Publishing. Reproduced with permission. All rights reserved.
To gather more data on the specific parts of the detectors contributing to transient noise, results of automated correlation or classification algorithms can reveal more information, along with knowledge about the detectors to interpret results. Statistical methods are used to compare glitches in the gravitational-wave strain channel with glitches in auxiliary channels [50][51][52][53]. There are also methods specifically targeted to identify the source of specific types of noise, such as scattered light [54,55] or lightning [56][57][58][59].
Among the algorithms for making statistical comparisons between channels, the Hierarchical Veto (HVeto [25,[60][61][62]) method is especially used for instrumental noise investigations in LIGO. This algorithm begins with the set of all the Omicron triggers in the main gravitational-wave channel and then finds statistically significant time coincidences with glitches in other channels. The hierarchical nature of the method means that it performs multiple rounds of analysis, and after each round it removes the glitches that were found to be correlated with the most significant auxiliary channel. Successive rounds will typically pick up on different types of glitches . The results further include lists of channels that are all related to the most significant channels, and Qscans of the glitches in each round.
Careful studies of the environment around each detector can also further inform the potential sources of transient noise. During each observing run, a series of "injections" that simulate excess environmental noise are regularly performed at each site [25,[60][61][62]. These injections include inducing magnetic noise, loud sounds, and excessive shaking at a level of excitation significantly exceeding the normal environmental level but within the dynamic range of the detector. These injection studies are only completed during time periods when the detector is not actively observing, as they add large amounts of additional noise to the detector data. The coupling between each noise source and the detector can be calculated by measuring the amplitude of the injection using monitors of the physical environment and comparing this to the amount of noise seen in the gravitational-wave strain channel. These measurements allow scientists to know if any sources of noise in the local environment are strong enough to impact the gravitational-wave strain data. If the transfer function between a particular noise source and the detector strain is linear, then the measured coupling is sufficient to estimate the amount of noise each source adds to the detector data. While some noise sources, such as scattered light, do not have a linear coupling, the majority of sources from the local environment are indeed linearly related to the detector strain.
These various methods are used in concert to perform investigations of particular types of glitches, with each method providing particular clues that eventually lead to a deeper understanding of the channels involved in the noise. However, none of the algorithms on their own can provide a solution to the problem without additional knowledge of how the channels are related to the complex inner workings of the detectors, a working knowledge of how noise propagates through the various instrumental systems, and additional targeted analyses of the noise. For example, Figure 4 shows a follow-up study that was done to diagnose a population of glitches in the strain channel that was traced to a compressor switching on and off. This knowledge is built up over time through investigations on the subsystem level, noise injections, and communication with instrumental experts.  and a signal from a compressor (middle) over the course of several hours. These channels were found to be significantly correlated with glitches in the strain channel from the HVeto algorithm. For the four peaks in the band-limited RMS time series marked by gray shading, the gravitational-wave strain data are also shown (bottom). This analysis provided further evidence that the compressor switching on and off was causing magnetic glitches present in the gravitational-wave strain data. Plotted compressor and magnetometer data are from [30] and strain data are accessed from GWOSC [63].
Finally, targeted investigations on site at the detector are generally applied to further confirm the specific cause of a noise source. These investigations could be types of noise injections to try to simulate the glitch. For example, in the famous thirsty ravens investiga-tions [31], scientists simulated the effects of ravens pecking, and then observed how this introduced similar glitches in the detector.

Diagnosing and Mitigating Persistent Noise
In addition to investigating glitches in the detector, instrumental investigations also target long-term persistent noise in the detector. Improving the overall sensitivity allows us to see farther into the universe in all types of gravitational wave searches. Lowering the overall noise in the detectors, reducing extra noise at specific frequencies, and keeping the detectors running in a stable state are especially critical for searches for persistent gravitational-wave sources.
Characterization of persistent noise follows a similar process to transient noise investigations, with identification of problematic noise from summary plots of the main output of the detector, gathering information from automated algorithms searching for correlations with auxiliary channels, synthesis of the results from these algorithms with prior knowledge of the detectors, and targeted investigations to continue to build up that knowledge. The types of tools used for these steps of the investigations are different though.
Persistent noise at constant frequencies, or "lines", can be particularly harmful for searches for continuous gravitational waves. Tracking lines in the data and keeping lists of known sources of lines is an important part of the instrumental investigation process [64]. The FScan [65], NoEMi [66], and FineTooth [64] tools are used to track lines in the strain channel and auxiliary channels. FScan produces spectra and spectrograms for each day, and NoEMi tracks lines, follows wandering lines (which change in frequency over time), and indicates coincidences with occurrences in auxiliary channels. Often groups of lines are found to have a constant frequency separation between adjacent lines and are referred to as "combs". FineTooth is a tool for finding and tracking combs of lines in the strain channel as well as magnetometers, which have proven to be useful witnesses of electronics noise that manifests itself in combs. Additionally, noise investigations of lines and combs are frequently aided by analyzing the correlation between the strain channel and auxiliary monitors at specific frequencies, to narrow down the potential sources of noise. In cases when the time in the observing run that the line first appeared can be identified, it is also possible to use the list of instrumental upgrades that occurred near that period to narrow down the search.
Although the long-term performance of the detector is particularly relevant for detecting persistent sources of gravitational waves, numerous metrics were developed to characterize the long-term performance of the ability of each detector to observe transient gravitational-waves. One key figure of merit that describes the overall sensitivity of the detector is the binary neutron star (BNS) "range", a measure of the distance to which the detector could detect a binary neutron star inspiral at an SNR of 8 at any given time, averaged over the entire sky and the different gravitational-wave polarizations. The BNS range fluctuates over time with the noise in the detectors, and significant drops in the BNS range over long periods of time can sometimes be a first clue that something is wrong in the detectors. Detector characterization experts study fluctuations in sensitivity [67] in order to provide clues about what noise sources could be affecting the range of the detector. These investigations are conducted in order to increase the stability of the detectors and maximize their sensitivity, in order to increase the effectiveness of all types of gravitational-wave searches.
The same environmental conditions that lead to changes in the detector range can also lead to a decrease in the amount of time that the detectors are actually in a state that is able to record observational-quality data. Severe environmental disturbances such as nearby earthquakes (or severe earthquakes even far away!) can make the detectors lose the carefully tuned and aligned resonances in the optical cavities required to operate the detector. This loss in observing time affects all astrophysical analyses, so an important aspect of detector characterization is analyzing how the detectors' stability can be increased to be more resilient to such effects. Investigations of the detectors' "duty cycle" are aimed at increasing the amount of time that the detectors are actually operational [68,69]. For example, when an earthquake is known to be coming (based on data from far away seismometers), the detectors can be placed temporarily in an "Earthquake mode", a modified seismic control scheme which allows the detectors to maintain arm cavity resonance during more earthquakes, therefore increasing the amount of usable data [70].
Noise sources that may not be strong enough to be observed in a single detector can still be particularly troublesome if the noise is correlated between detectors. Typically, multiple detectors are used to identify gravitational-wave signals and it is assumed in many analyses that no correlations exist between data from different detectors. Hence, any strongly correlated noise sources that are not accounted for could bias current gravitational-wave analyses. Although multiple potential sources for correlated noise were suggested [17], the only source of noise known to be relevant for ground-based detectors is magnetic noise from "Schumann resonances" [71], fluctuations in the Earth's magnetic field, and strong lighting strikes [72]. The noise contribution from these resonances is currently too low to impact searches for gravitational waves with current detectors, but will become more relevant in the future as detectors increase in sensitivity [73]. Regardless of their expected impact, the noise contributions from correlated magnetic noise is closely monitored using magnetometers and measurements of the coupling between magnetic noise and the strain data at each observatory [18,74].

Gravitational-Wave Data Quality
Data quality issues that have not been mitigated in the instrument can be vetoed from the data that is analyzed if these data quality issues can be confidently separated from astrophysical signals. Whether or not data are vetoed depends on several factors, such as the severity of the known problems with the data, the impact of the data quality issues on the astrophysical search backgrounds, and the ability to construct a veto that efficiently removes times of poor data quality without throwing away too much data. This section details how data quality issues are identified and how this information is propagated to gravitational-wave analyses.

Data Quality Impacts on Searches for Transient Gravitational-Waves
Transient sources of gravitational waves include signals from compact binaries coalescences (CBCs) [3] and other high-energy astrophysical phenomenon, such as supernovae [75]. Searches for these transient gravitational waves must separate bursts of excess power due to gravitational-wave signals from instrumental artifacts. A variety of pipelines were developed that have been used in analyses by both the LVK [76][77][78][79][80][81] and other groups [82,83]. These searches primarily incorporate information about the data quality using "data quality vetoes", which identify time periods where the data have a high probability of corruption by glitches.
The process of developing data quality vetoes frequently begins much in the same way as general instrumental noise investigations: excess noise is identified through automated pipelines and then traced to try to find correlations with auxiliary monitors. However, in developing data quality vetoes, the primary emphasis is on improving search results rather than improving the detector, so rather than being concerned with any glitches or excess noise, the noise is viewed first from the lens of the search pipelines. For example, if search pipelines show an excessive number of outliers that are due to instrumental artifacts rather than astrophysical signals, there will be more motivation to develop a veto to eliminate the noise. After the initial identification of the problem, similar tools are used to provide a clear picture of when that type of problem occurs, for example using Omicron [35,36] to gather all the glitch times, and Hveto [50,51] or iDQ [53] to find correlations with auxiliary witnesses, often alongside the investigations into the source of different noise source that were discussed in Section 2.
Data quality vetoes are typically applied in separate categories at different points in the search pipeline, and are individualized based on the different types of searches. In recent observing runs, three categories of veto were used, referred to as "category 1", "category 2", and "category 3". In all searches, the first category of veto, category 1, includes data that is deemed to be completely unusable and unreliable, which is eliminated before any of the search pipeline is conducted. Beyond this most severe category of data quality vetoes, other data quality cuts are tuned carefully based on how the noise impacts the different search pipelines. Category 2 vetoes are developed to target small time periods around specific types of glitches, while category 3 vetoes are based on statistical correlations between auxiliary channels and the gravitational-wave strain data. The process for creating data quality vetoes and their impact on the total spacetime volume that searches for compact binary coalescences are sensitive to (referred to as the "sensitivity" of a search) is described in greater detail in [38,84].
Several common strategies exist for creating data quality vetoes. One strategy, which is especially employed in the creation of category 1 vetoes, is to determine segments manually, based on known severe or unusual environmental or instrumental disturbances at a very specific time. The times of these vetoes are frequently taken from the online detector logbook. Examples of this kind of manual determination from O3 are times when forklifts, cranes, or large trucks were operating near the detectors during observing mode [85].
Many types of noise, however, are recurrent and require vetoes to be more finely tuned and programmed in order to select all of the data that needs to be cut. This is how category 2 vetoes are generally developed. Ideally, the noise is traced to a correlated auxiliary witness signal, which can be used to predict any time the noise appears in the strain data. For some types of vetoes, this will involve creating a threshold on an auxiliary witness time series. An example of this type of threshold is shown in Figure 5. For other types of glitches, vetoed times are determined by auxiliary Omicron triggers, usually after a correlation was found by Hveto or a similar algorithm. Whether these vetoes are built using Omicron triggers, an auxiliary channel, or some processing of an auxiliary channel, the exact veto parameters are tuned carefully to maximize the ratio of efficiency (the percent of glitches removed) to deadtime (the percent of total time removed). This ensures that the veto is capturing the most noise while minimizing the risk of removing astrophysical signals.
One important consideration in development of data quality vetoes is to make the primary determination based on auxiliary data and the astrophysical search backgrounds, without using the strain data itself. The exception to this is in cases where there is severely loud noise or noise present in the data which is much louder (e.g., by an order of magnitude) than could be reasonably expected of an astrophysical signal. For the CBC searches, loud glitches are removed from the data before analysis using "gating", described in Section 5.
The channels used to produce vetoes must also be determined to be 'safe" for vetoes, meaning that they are truly witnesses only of auxiliary information, and they will not be influenced by the strain data through control loop feedback. This is important to ensure that the witnesses will only veto noise, not astrophysical signals. The safety of channels is determined by injecting strain noise into the detectors by physically moving the test masses with a known frequency and amplitude [86]. Data from auxiliary channels is then searched for evidence that this injected noise propagated into other channels. If such evidence if found, these channels are not deemed safe.
Different search pipelines make additional data quality cuts or implement additional signal consistency checks or the results of other algorithms, based on the needs determined by reviewing each search's background noise. Searches for generic transient gravitationalwave signals ("bursts") for example, directly use the results of Hveto in their data quality cuts as a category 3 veto.
The amount of time vetoed from the analyses has varied slightly over the different observing runs of Advanced LIGO and different analyses, but typically the total time vetoed is on the order of ∼2% or less. In the first observing run, there were several periods of very poor data quality at the Hanford observatory traced to one particular noise source, which ultimately led to vetoing 4.6% of the coincident detector time as a category 1 veto [17]. However, other data quality vetoes and subsequent observing runs have removed less data.
In the second and third observing runs, the vetoes have removed ∼1-2% of the data in total [5,7,18]. Of the confident events identified in the first three observing runs by LVK analyses [4][5][6][7], only 2 events have overlapped data quality veto segments. Considering the amount of time removed by these vetoes and the total number of detected events, this number is consistent with the expected rate of chance coincidences. Furthermore, multiple detectors in operation and the diversity of methods that data quality vetoes are used in analyses allowed these events to still be detected despite the overlap with a veto. 100  Signal-to-noise ratio (SNR) Figure 5. An example of the design and effectiveness of a data quality veto. This data quality veto was designed to remove time periods with glitches from fluctuations in the power of a laser used to control the alignment of a mirror (referred to as an "optical lever"). Top: The root-mean-square of the optical lever laser power after bandpassing between 10 and 50 Hz. The veto threshold is denoted with a black dashed line. Bottom: The time and frequency of glitches in the gravitational-wave strain data as identified by Omicron [35,36]. The color of dots indicates the SNR of the glitch. Time above the veto threshold are shaded in gray. These times represent all of the loud glitches in this time period. Figure reproduced from [18]. © IOP Publishing. Reproduced with permission. All rights reserved.
The benefits of data quality vetoes were reduced in recent observing runs due to improvements in how search pipelines address glitches in the data. In analyses of data from the first observing run, data quality vetoes increased the sensitivity of a search for CBC signals by up to ∼50% [38]. However, the sensitivity of the same search pipeline in the third observing run increased by only ∼5% [18]. As noted in the previous paragraph, the amount of time removed by data quality vetoes changed by only a few percent across these observing runs, suggesting that the difference in sensitivity increase is due to improvements to the search pipeline that were implemented between the two periods considered based on the understanding of detector data from the investigations discussed in this review. One such improvement is accounting for how glitches in the data are similar to only a small portion of the CBC parameter space [87,88]. This change meant that the presence of glitches only reduced the sensitivity of the search in a small portion of the parameter space, which in turn reduced the benefits of mitigating glitches with data quality vetoes. For this reason, it is expected that data quality vetoes will be needed less frequently in future CBC searches. Searches for gravitational-wave bursts have also made improvements that have reduced the benefits of data quality vetoes, but to a lesser extent. Data quality vetoes still significantly improve the sensitivity of burst searches as burst waveforms are more general than CBC waveforms and hence are more likely to be similar to detector artifacts.
In addition to data quality vetoes, another data quality product, the iDQ timeseries, was used in analyses of LIGO data in the third observing run. The iDQ algorithm analyzes the detector state using auxiliary channel information using machine learning techniques to determine the likelihood that there is a glitch present in the strain data. These predictions are made for all times and stored as a time series with a sample rate of 128 Hz. Figure 6 shows an example of the iDQ timeseries near a glitch and near a gravitational-wave signal. At the time of the glitch, the likelihood of a glitch (as predicted by iDQ) is elevated. As expected, no such increase in likelihood is seen at the time of a gravitational-wave signal.
As of the end of the third observing run, only one search algorithm, GstLAL [77], has incorporated this iDQ time series. This CBC search used the value of the iDQ time series to down rank candidates that were predicted to be likely glitches [89]. This method contrasts the approach used with data quality vetoes where times that are likely to contain glitches are removed from the searches completely. This makes it possible to identify loud gravitational-wave candidates that happen to overlap glitches by chance. The use of the iDQ timeseries was found to increase the sensitivity of this CBC search by up to ∼5% [89], comparable with the increase when data quality vetoes were used.

Data Quality Impacts on Searches for Persistent Gravitational Waves
Persistent gravitational waves include line-like sources of gravitational waves, such as from rotating neutron stars [92], and broadband sources, such as the background signal from weak, unresolved CBC signals [73]. A wide variety of different searches were used to look for persistent gravitational waves, by both the LVK [73,[92][93][94] and other groups [95][96][97]. As searches for persistent gravitational waves are looking for signals present continuously throughout an observing run, these searches are generally insensitive to short duration bursts of power. For this reason, most of the data quality products used for transient searches are not used by searches for persistent gravitational waves. The only data products that are used by all searches are category 1 vetoes which flag egregious data quality periods. Persistent gravitational-wave searches instead rely on other data quality products and methods that are better suited for their needs.
The data quality product most useful for searches for continuous waves (CW) is lists of known instrumental lines. As mentioned in Section 2.3, lines are particularly harmful for these searches. Each line in the data only impacts a small portion of the frequency parameter space searched by CW searches, so it is not required that these lines be mitigated before searching the data. Instead, any potential candidate that is identified at a frequency of a known instrumental line is not considered astrophysical. This method, however, means that searches for gravitational-waves are not able to detect sources with frequencies very near instrumental lines. These line lists are publicly distributed after the end of each observing run when the bulk data are released [98][99][100]. Additional details on this process can be found in Section 4.2.
In contrast to CW searches, searches for stochastic gravitational waves look for broadband sources of gravitational waves. These searches can be biased by severe non-stationary data, including particularly loud glitches. For this reason, highly non-stationary data are excluded from the analysis, beyond what is removed by data quality vetoes. Stochastic searches typically remove large amounts of data (sometimes over 20% [73]), balancing the benefits of removing segments with poor data quality against the reduction in sensitivity from a shorter analysis duration. Monitors of the data quality are used throughout the observing run to identify times with poor data quality that could impact stochastic searches.

Event Validation
An integral aspect of gravitational wave analysis is investigation of the impact of instrumental artifacts on detected signals. As previously discussed, a large array of sensors is used at each observatory in order to monitor the performance of the detectors and their environment. As instrumental or environmental disturbances could mimic gravitationalwave signals, it is necessary to confirm that no such instrumental artifacts are present to increase confidence in the astrophysical origin of any gravitational-wave detection. Additionally, the presence of noise at the time of a true astrophysical signal could impact the analysis, so it is important to determine whether any additional processing of the data is needed. These investigations are collectively referred to as "event validation".
Event validation is especially important for the first detection of a new type of astrophysical source, particularly if it is known that instrumental artifacts could produce a similar signal to the newly discovered gravitational-wave event. To date, the only confident gravitational-wave detections that have been made are from compact binary coalescences, but even within this class there is a broad range of different timescales that must be considered. These validation procedures were a particularly important component of the first detection of a binary black hole merger (BBH), GW150914 [17], and the first detection of an intermediate mass black hole (IMBH), GW190521 [11]. Event validation will also be an essential element of confirming the observation of gravitational waves from non-CBC sources.
The methods used in event validation procedures rely on many of the same tools previously discussed in Sections 2 and 3. However, rather than analyzing the data or detector status over the entire observation time and all frequencies, event validation procedures focus on a specific part of the parameter space (typically a specific time or frequency). This allows deeper investigation of the state of the detector and data quality than would be typically possible. Furthermore, the properties of the identified gravitationalwave candidate allow additional tests to be used that further constrain how the detector and the data quality could have impacted the caniddate in question.

Validation of Transient Sources of Gravitational Waves
The validation of transient gravitational-wave events in LVK analyses been previously described in [5,17,18]. In this section, we will highlight the main procedures that were employed.
As transient sources of gravitational waves typically last less than a few minutes, the data surrounding the time of the signal can be more heavily scrutinized than the bulk data. There are three main types of evidence that are investigated as part of transient event validation: evidence of instrumental origin, evidence of transient noise overlapping the signal, and evidence of any detector operating in a non-standard manner at the time of the signal.
First we investigate if there is "evidence that supports an instrumental origin" for the candidate event. Two metrics were used to establish if a candidate has evidence of instrumental origin: (i) evidence that a known class of instrumental artifact is present at the time of the candidate and can cause similar non-astrophysical candidates in a search pipeline (ii) evidence that the observed excess power in at least one detector can be accounted for by instrumental sources. The former metric requires long-term studies of both the detector and the relevant search pipeline, while the latter can be established, in some cases, using only a small amount of data containing the candidate.
The main method of establishing evidence of instrumental origin is with evidence of an auxiliary sensor that can account for the majority of the power observed in the strain data. This is done by examining a wide array of witness sensors to understand if they are correlated with the observed strain power. Multiple methods are used to identify correlations, including manual inspections of visualizations of data [41,42,[101][102][103], machine-learning interpretation of the strain data [47][48][49], tools that estimate statistical correlations between channels [50,52,53,104,105], and projections of the excess power in the observed strain data based on previous measurements between each auxiliary channel and the strain data [25,60,61].
In addition to testing if the observed power can be accounted for by auxiliary witnesses, the long-term performance of the detectors is investigated to understand if changes to the detector state could create a false candidate. These tests include tests of the stationarity of the data [106,107] and inspection of long-term trends of detector performance [108,109].
Even when there is no evidence of instrumental origin, it is quite common for excess power from instrumental artifacts to be present in the data near an astrophysical signal. This association is coincidental, and was found to occur at a rate expected by chance. Although excess power overlapping a signal is not sufficient cause to doubt the astrophysical origin of a gravitational-wave signal, the excess power should be mitigated before additional analysis of the event is completed. Furthermore, when an event is detected with multiple detectors, it is possible to identify any excess power that is not astrophysical by correlating the multiple data streams. Multiple detectors also allow confirmation that the identified gravitational-wave signal is unrelated to nearby glitches. An example of an event that overlaps excess power from glitches (GW190424_180648) is shown in Figure 7. In this case, two separate types of glitches overlapped the signal. Multiple glitches overlapping a signal is unlikely, but possible, given the recent rate of glitches in gravitational-wave detector data. Mitigation methods used to address these and other similar glitches are discussed in Section 5.
To identify this excess power, methods similar to those used to investigate evidence of instrumental origin are also used. One of the main methods used is visualizations of the data [41,42,[101][102][103]. As an astrophysical signal is coherent between detectors, any incoherent power present in the data must be instrumental in origin. Further simplifying this process is the well-established morphology of astrophysical signals from CBC. There is currently no evidence of measurable deviations from general relativity for these signals [10], meaning that any power not explained by general relativity is also likely to be noise. Hence it is generally much easier to establish evidence of excess power than evidence of instrumental origin. If excess power is identified, other previously mentioned tools are used to potentially identify the cause.
While often considered separately from the event validation procedures, additional, similar, tests of signal morphology and parameters are performed as part of analyses of detected gravitational-wave events. These include tests such as the expected signal in multiple detectors [110] or the generated skymap [111,112]. In the case of events with well-known morphologies, such as CBC signals that follow general relativity, we can use additional signal consistency tests to test the residual power [113]. Correlations between the results of different analyses and specific glitch classes have also been investigated as methods of rejecting non-astrophysical signals [88,114] Figure 7. An example of a gravitational-wave signal (GW190424_180648) with two different types of glitches also present nearby in the data. Event validation procedures were able to identify the source of both glitches as well as confirm that there was no evidence the gravitational-wave signal was an instrumental artifact. The source of one of the glitches was vibrations from an automated camera shutter that was inadvertently left running in one of the detector end stations [18] while the source of the second is light scattered off of the main laser beam path that reflects and then interferes with the main beam (referred to as "fast scattering") [46]. This example highlights that it is possible to detect and analyze a gravitational-wave signal despite the presence of nearby glitches. Data plotted is from LIGO Livingston and accessed via GWOSC [90].
The validation process of non-CBC sources of transient gravitational waves, i.e., bursts, is largely similar to that for CBC sources. However, the lack of a known waveform model prevents the use of many of the signal consistency tests available for CBC validation. A similar situation is true for gravitational-wave candidates that are not consistent with general relativity. In the case of a potential burst signal, the most important source of information is the auxiliary channels. Statistical correlations between the strain data and these additional data streams can demonstrate that the observed strain is not astrophysical.
To facilitate rapid follow up of gravitational-wave events by electromagnetic and neutrino observatories, the data from gravitational-wave detectors is continuously analyzed to identify events as quickly as possible. However, these real-time searches are less able to account for the changing state of the detectors and hence are more likely to falsely claim a detection. In recent searches for gravitational waves conducted in real-time, the retraction rate has been high, requiring significant input from event validation investigations. In the third observing run, the retraction rate of candidates released in low latency was 43% [115]. This high retraction rate is expected of low-latency analyses, as quickly changing detector states cannot always be accounted for as part of the analysis. Despite these concerns, low-latency analyses are required for immediate follow up of gravitational-wave signals with other observatories.
For catalogs of gravitational-wave candidates based on post-facto analyses of the data, typically only a small number of candidates are found to have evidence of instrumental origin. For most catalog events, no data quality issues were identified. The main impact of data quality issues is identification of transient noise overlapping the signal, rather than causing the signal. An example of an astrophysical event with transient noise present is shown in Figure 7. Of the 93 unique confident candidates in GWTC-1, -2, -2.1 and -3 [4][5][6][7], 18 were found to be coincident with transient noise, while none were found to have instrumental origin.
For low significance catalogs, namely those containing "marginal" triggers, the number of false alarms is much higher. Evidence of instrumental origin was found for 11 out of 25 marginal candidates in GWTC-1 [4], GWTC-3 [7], and the O3 IMBH Search [116]. In much deeper catalogs, such as GWTC-2.1 [6] or 3-OGC [83], many of the candidates likely also have evidence of instrumental origin. However, event validation of these large candidate lists has yet to be attempted.
At present, many of the event validation procedures described in this subsection can only be completed by members of the LIGO, Virgo, and KAGRA collaborations. Many analyses completed as part of event validation almost exclusively rely upon data from auxiliary channels. At the time of writing, these data are not available to those outside of the detector collaborations except for short segments of data [117]. Hence validation completed by other groups [82,83] must only rely upon inspection of the available strain data and publicly available data quality metrics.

Validation of Persistent Sources of Gravitational Waves
Similar to transient event validation, validation of persistent gravitational-wave candidates relies upon two metrics: investigations of known noise sources and signal-consistency checks. In this subsection, we will briefly outline some of the additional signal consistency checks that are used to vet candidates. No candidate has yet to pass these checks and have high enough significance to be claimed as a detection candidate.
The primary method of validating line-like gravitational-wave candidates is by investigating for evidence of an instrumental line at the same frequency as the signal. Using the line lists discussed in Section 2.3, the list of candidates is filtered. The large number of instrumental lines in the data means that this step vetoes a significant number of candidates. An example of the result of vetoing candidates is shown in Figure 8. All candidates removed due to an association with a known instrumental line are marked with gray circles.
For line-like sources, there is an expected signal morphology and amplitude in each detector that can be used to reject candidates [118][119][120]. A continuous signal from a gravitational-wave source will be roughly constant in amplitude but will slowly change in frequency due to the Doppler modulation of the Earth's movement around the sun [121]. The rate of the frequency evolution from the Doppler modulation is sky-position dependent, requiring searches for continuous gravitational waves to search each position (or small portions) of the sky separately. While these features complicate the search for these signals, they also can be used to reject instrumental artifacts that do not share these properties. Tests of these properties that have typically been used include: checking that the signal strength is the amplitude expected in each detector; checking that the signal to noise ratio grows in a uniform manner over the course of an observing run; checking that the signal exhibits the expected Doppler modulation of over time is present; and checking that the signal to noise of the candidate drops as the sky localization changes. A candidate that does not behave as expected for an astrophysical signal in all characteristics is likely instrumental in origin. Figure 8 demonstrates how these additional checks can be used to veto many candidates that are not already removed due to their association with known lines. In the displayed example from [118], no candidates remain after applying both the known line and signal consistency tests.
There was also a proposal [122] for signal consistency tests of a potential broadband, stochastic signal candidate. This method relies upon the known frequency-dependent correlation for an astrophysical stochastic background. This correlation is also dependent on the locations and orientations of each gravitational-wave detector in the network, improving the ability of these methods to identify a non-astrophyscial background.

Noise Mitigation
In addition to changes to the analysis methods, data quality issues can be mitigated by directly correcting the recorded gravitational-wave strain data. Depending on the frequency resolution, time duration, and the source of the data quality issues, a variety of mitigation methods are used. In the case of long duration noise sources that impact the measured power spectral density of the detector, the noise contributions from sources are modelled and subtracted. This process is typically referred to as "noise subtraction". Modelling and subtraction of short duration transients may use similar methods, but is referred to as "glitch subtraction". Further complicating the picture is the use of alternate glitch mitigation techniques that are agnostic to the glitch time series, and hence require no additional information beyond a time duration and bandwidth. Such techniques include "gating" and "inpainting". Each technique is chosen based on the available time and computational resources in addition to the specific data quality issue in question.
In this section, we outline current methods of noise mitigation that are used to analyze gravitational-wave strain data. Alongside each method, we explain the general uses of the method, and changes that were proposed for future analyses.

Noise Subtraction
The operational goal of a gravitational-wave interferometer is to have the lowest possible background noise. While numerous advances have recently been able to significantly improve the sensitivity of gravitational-wave interferometers, there remain known sources of noise that contribute to the absolute sensitivity of the interferometer. Some of these known noise sources are theoretically predicted based on the design and location of the interferometer; these are typically referred to as "fundamental" noise sources. Other sources of noise are typically called "technical" noise sources, and stem from a variety of known and unknown sources. For many of these technical noise sources, the time series of the noise contribution and its coupling to the strain channel can be precisely measured, allowing for the possibility of subtracting the relevant noise contributions post-facto. This section explains the techniques used to perform this noise subtraction and the main sources of noise that are targeted.
Multiple proposals for noise subtraction have been explored since the early detector era [123][124][125], primarily focused on the use of Weiner filtering [126] to calculate the noise contributions from a variety of noise sources. Weiner filtering is particularly useful in cases where a linear coupling exists between the noise source and the gravitational-wave strain data. Additional subtraction procedures based on machine learning [127][128][129] are also being explored. These new methods have shown promise in subtracting sources of noise that exhibit non-linear couplings to the gravitational-wave strain.
The first extensive use of noise subtracted data occurred in the second observing run of Advanced LIGO. During this observing run, vibrations from turbulent water flow that was used to cool the pre-stabilized laser system introduced significant beam jitter into the interferometer. This jitter added broadband noise from 10 to 2000 Hz at the LIGO Hanford detector. This noise source could not be addressed without an intrusive upgrade to the laser system, meaning that the noise source could not be mitigated until the next observing run.
Two different noise subtraction procedures [130,131] were applied to the O2 dataset. The first [131], based on Weiner filtering, was applied to small segments of data around significant events. The second [130] was based on a similar method of measuring noise contributions [132] and was applied to the entire O2 dataset. In addition to noise from jitter, these noise subtraction procedures removed noise contributions from calibration lines, power line noise, and alignment and length sensing control noise. After applying noise subtraction, the binary neutron star range improved by 25% at LIGO Hanford, which led to a 30% improvement in the overall sensitivity of the gravitational-wave network [130]. Figure 9 shows a comparison of the detector noise curve before and after noise subtraction from a representative time in O2. A broadband decrease in noise can be seen in addition to multiple line features removed. Original H1 data Cleaned H1 data Figure 9. A comparison of the amplitude spectral density (ASD) of the strain data at LIGO Hanford (H1) before (green) and after (blue) subtraction of broadband noise in O2. By using instrumental witnesses to known sources of noise, broadband noise between 80 and 1000 Hz was significantly reduced, as well as elimination of many narrow frequency band spectral artifacts caused by calibration lines and other instrumental factors. Figure is reproduced from [130]. © IOP Publishing. Reproduced with permission. All rights reserved.
In the third observing run, there were no broadband sources of noise that were identified as potential noise subtraction targets at LIGO. For this reason, only noise from line-like sources was subtracted. A new procedure [133] was adopted that was better able to subtract line-like sources. This procedure was applied at the same time as the detector data was calibrated. In addition to this subtraction procedure, LIGO data from the third observing run included subtraction of noise from slow modulations of the 60 Hz power line. These modulations added extra noise as sidebands around the main power line. To subtract this noise, a machine-learning-based method [134] was applied.
Noise subtraction of broadband sources of noise was also used with Virgo data in the third observing run. Noise subtraction with Virgo data used similar procedures to those used with LIGO data [135,136]. The sources of the subtracted noise included laser frequency noise, scattered light noise, and amplitude noise from the laser modulation frequency. This noise subtraction improved the sensitivity by up to 7 Mpc [136].

Glitch Subtraction
Recent data from ground-based interferometric detectors contains approximately 1 glitch per minute [5,7], which makes it likely that a significant number of gravitationalwave candidates overlap glitches. One of the most prominent examples of such a case is GW170817, when a loud glitch was present within a second of the merger time [9]. In recent catalogs, 20% of candidates overlapped glitches [5]. Signals that are in the sensitive band of the detectors for longer periods, such as binary neutrons, are much more likely to overlap glitches. The Bayesian analyses used to estimate the source properties of gravitationalwave candidates assume that the data are Gaussian and stationary; the presence of glitches severely violates these assumptions and hence may bias the results [137]. It is currently standard procedure to subtract any glitches present in the data with glitch subtraction algorithms prior to performing the final parameter estimation.
The most commonly used algorithm for glitch modelling and subtraction in LVK analyses is BayesWave [138][139][140]. The BayesWave algorithm is used to measure the power spectral density of data that is used in analyses of gravitational-wave source properties, in addition to modelling non-Gaussian glitches. BayesWave is a Bayesian analysis that uses sine-Gaussian wavelets to model excess power in gravitational-wave data. As more simple models are preferred by the Bayesian analysis, glitch models that use large numbers of wavelets are penalized, reducing the risk of overfitting the model. BayesWave analyses using data from only one detector were used in the second observing run to model glitches, most notably the glitch overlapping GW170817 [138,139]. An example of a case where BayesWave was used in the third observing run is shown in Figure 10. In this case, a blip glitch that overlapped a gravitational-wave signal was modeled and subtracted. This model requires there to be minimal overlap between any glitch and any potential signal. Whitened strain Figure 10. A demonstration of mitigating a glitch by modeling and subtracting the glitch with BayesWave [138][139][140]. eft column: Spectrograms of whitened strain data from the LIGO Livingston detector before and after subtraction of the glitch. Right column: Time series of the whitened strain before and after subtraction is plotted in blue, while the difference between the data before and after glitch subtraction is plotted in orange. After modeling and subtracting the glitch, the data are consistent with Gaussian noise. Plotted data are from [141].
In addition to modelling excess power in one detector, BayesWave is able to model power coherent between multiple detectors. Recent updates to BayesWave have added support for a "signal plus glitch" configuration that models the coherent signal power at the same time as the incoherent glitch power [140]. This allows the BayesWave model to separate power from a glitch from power that is due to an astrophysical signal. Hence the signal plus glitch configuration is able to model glitches overlapping gravitational-wave signals. Analyses of simulated signals on top of glitches have shown that BayesWave is capable of correctly discriminating between the two sources of excess power [142,143].
Other glitch models [144,145] based on machine learning approaches or class-specific glitch models [146] were also proposed. A subset of these methods have also shown promise in accurate waveform extraction in the presence of non-Gaussian noise.

Gating and Inpainting
In several use cases, precise glitch modelling is not required to mitigate the impact of glitches on gravitational-wave analyses. Additional methods that are more computationally efficient were developed that remove the data surrounding glitches. While this does lead to some loss of information, the reduced complexity of these techniques allows them to be applied on datasets that contain large numbers of glitches. The two most commonly used techniques to remove segments of data in gravitational-wave analyses are referred to as "gating and "inpainting". Gating can be completed before or after whitening of the data based on the specific analysis technique while inpainting must be done before whitening the data. A comparison of data containing a glitch before and after gating or inpainting is shown in Figure 11.
When gating is completed pre-whitening, the data are typically windowed using an inverted Tukey window [147]. This window zeroes a short segment data containing with a taper at each end. The length of the taper is generally between 0.125 and 0.5 s while the length of the zeroed data can vary based on the duration and strength of the glitch. The use of the taper is to reduce the risk of artifacts introduced during the whitening process, particularly discontinuities that would severely corrupt the data. While the taper does not completely prevent artifacts (typically loud lines in the data "ring" during the gate), these artifacts do not limit the sensitivity of searches for compact binaries that use this gating method for loud glitches. An additional downside of this method is that the length of the taper often is much longer than the duration of the data that is zeroed, increasing the duration of data that is disturbed by this method.
When gating is applied post-whitening, the data are merely zeroed out for the region of interest. This method does not introduce artifacts from the gating method. However, for sufficiently loud glitches, the whitening filter can lead to the data surrounding the glitch to be corrupted. The approximate time segment that is corrupted is given by the the duration of the impulse response of the whitening filter. Hence care must be taken to ensure that the duration of zeroed data is long enough to remove the impact of the whitening filter.
The identification of times to gate is also an important analysis choice. A common choice is to gate a small segment of time around any data points that are above a fixed threshold in the whitened time series. A typical threshold is 25-100 times the amplitude of the standard deviation of the whitened strain data. Around each data point above threshold, a fixed amount of time is added to ensure that the glitch is fully excised. Adaptive methods have also been developed [148], which allow the use of variable gating thresholds and time durations. Additional metrics based on the BLRMS of the strain data [149] or the BNS range [78] have also been used. Gating times have also been chosen based on predictions from auxiliary sensor information, which is able to predict the presence of a loud transient in the gravitational-wave strain data [18]. Whitened strain Figure 11. A comparison of the use of gating and inpainting to mitigate a loud noise artifact in the data. Left column: Spectrograms of whitened strain data from the LIGO Livingston detector before mitigation, after mitigation with pre-whitening gating, and after mitigation with inpainting. Right column: Whitened strain time series for the same three cases. Both gating and inpainting methods mitigate the excess power from the instrumental artifact but inpainting introduces fewer artifacts and disturbs less of the time series. This improved mitigation comes at the cost of additional complexity in the method and the requirement that a specific PSD is used to analyze the data after inpainting. Plotted data are accessed from GWOSC [90].
Gating is used extensively in LVK analyses. Pre-whitening gating is currently used by several CBC search algorithms ( PyCBC [76], MBTA [78], PyGRB [150]) as well as a search for generic transients (X-pipeline [151]). This method has been used in low latency analyses [152] to manually mitigate loud glitches before localizing the source of gravitational-wave signals. Post-whitening gating is used by two other CBC search algorithms (Gstlal [77] and SPIIR [79]) and a search for generic transient gravitational waves (cWB [80]). In O3, a gated strain dataset was generated for use with searches for persistent gravitational waves [18,149]. Two CBC searches (MBTA and SPIIR) also use pre-whitening gating to remove time periods marked by data quality flags in addition to gating based on a threshold on the whitened strain data.
Inpainting [153] is a similar procedure to gating, but explicitly addresses the impact of whitening on the modified time series. To construct the inpainted time series, the prewhitened time series is first zeroed over the region of interest. The time series is then whitened and the projection of the zeroed data into the whitened data space is calculated. This calculation is the solution to a Toeplitz system [154], and can be done computationally efficiently. This projection is then subtracted from the original time series so that the whitened time series is approximately zeroed over the region of interest. This method has the advantage of introducing minimal artifacts and removing the need to taper the data. For analyses that use a low rate of gates (less than 1 per few minutes, i.e., typical of LVK analyses) these differences may not have measurable effects. However, the benefits of inpainting allow this method to be applied to the data at a much higher density without corrupting the data stream. This in turn allows for a larger number of non-Gaussian artifacts to be addressed with this method in a single analysis [82].
While glitches are typically addressed by removing data in the time domain, it is possible that only a subset of frequencies are impacted, which can be addressed in the frequency domain. In these cases, the data quality issue can be mitigated by using a reduced frequency bandwidth in the analysis, filtering the data to suppress impacted frequencies, or modifying the PSD used in analyses to suppress impacted frequencies [5].
As these methods remove the excess power from any gravitational-wave signal that is present during the gated or inpainted time, they introduce an bias in the recovered amplitude of any gravitational-wave signal. This bias can be analytically calculated based on the gate configuration [155] or inpainting filter [153] that is used. Additional methods [156] based in machine learning [157] have also been proposed to address this bias.

Spectral Estimation Methods in the Presence of Non-Gaussian Noise
An essential element of gravitational-wave analyses is the estimate of the power spectral density of the data. Most analyses implicitly or explicitly assume that gravitationalwave data are stationary and Gaussian. This also implies that there is a single true power spectral density that defines the properties of the noise. However, it is well known that this is not the case, although it is still approximately true over short timescales. As gravitationalwave analyses have evolved, the assumptions of stationarity and Gaussianity have begun to be relaxed as new methods were developed to account for the non-ideal nature of the data. This sub-section outlines current techniques to account for variations in the properties of noise over multiple timescales.
Several straightforward techniques were used to calculate the power spectral density of gravitational-wave data. Perhaps the most simple is Welch's method [158], where the noise spectrum is measured over multiple short segments, and each measurement is averaged. If the noise were stationary and Gaussian, this method would be sufficient. However, the averaging employed in this method is easily impacted by the presence of loud glitches. To address this concern, a different method [159], referred to as the "median method", is often used. This method is identical to Welch's method, but uses the median estimate instead of the mean. This method is less susceptible to short, loud transients, but also introduces a known bias into the spectral measurement, even in the idealized case [159]. However, this bias is typically accounted for by normalizing the measured PSD by the expected bias. In cases where the data are not stationary, both methods are unable to account for this variation. Both methods also are known to have biases for spectral measurements of non-stationary, non-Gaussian noise [160,161]. A comparison between the PSD of data containing a glitch estimated using the median method and Welch's method is shown in Figure 12. The PSD estimated using Welch's method shows large deviations from the true PSD due to the presence of the glitch, white the PSD estimating using the median method shows fewer deviations.
To account for these limitations, LVK analyses have typically used the BayesLine algorithm [138,140] for spectral measurements. BayesLine models the noise over fixed timescales as the combination of a stationary and Gaussian noise background with potentially non-Gaussian transients present. The noise curve is fit using a spline and Lorentzians. This model has been shown to model the instantaneous noise of the detectors better than the methods previously described [162]. However, this does not account for potential non-stationarities. For analyses that use a few seconds of data, this is generally not a concern, but may become important as detectors become more sensitive and the duration of astrophysical signals increases. A PSD estimated using BayesLine is also shown in Figure 12. Compared to the other two PSD estimation features, features in the PSD with a small bandwidth are not as well captured by the BayesLine PSD. However, over the small timescales considered in analyses of transient gravitational-wave events, these deviations do not measurable. The calculated PSD for simulated data using the Median method [159], Welch's method [158], and BayesWave [138,140]. Right: The ratio of the calculated PSD to the known PSD using each method. The PSD calculated with Welch's method is severely biased due to the non-Gaussian artifact in the data. The BayesWave PSD does not capture all of the features in the known PSD due to fitting the PSD with a spline, but has the benefit of using significantly less data than the Median method and hence is less impacted by non-stationary noise. Plotted data are simulated using GWpy [163] and Bilby [164].
Additional Bayesian techniques were proposed, including marginalizing over the power spectral density uncertainty [165] and using different likelihood models that are better suited for non-stationary noise [160,161].
For analyses that use large stretches of data, it is not possible to assume that the noise is stationary over the full timescale considered. Multiple methods are currently used to address this challenge. The simplest solution is to reduce the timescale over which the noise is measured [76]. The measured noise properties can be allowed to change between each segments, and the measured noise spectrum for each segment is only used for analysis of that segment of data. This method is sufficient as long as the timescale over which the PSD is re-measured is shorter than the timescale over which the PSD varies. In practice, it is not possible to measure the PSD over such a short timescale. An alternate method is to allow the measured noise properties to drift with respect to time [166]. The use of a rolling, geometric mean of the noise spectrum can better account for shorter timescale changes than simply analyzing short segments of data.
To account for even shorter fluctuations in the noise properties, it is possible to develop specialized methods that account for the impact of such noise fluctuations on the analysis. In the case of matched filtering, the effect of a mis-measured PSD on the measured variance of the matched filter can be analytically computed [106,153]. This statistic can be measured over short timescales, and the relevant correction can be applied. This method can also be used for the specific PSD used in parameter estimation analyses [167]. While this method does allow for the tracking of short-duration variations in the spectrum, it is not generalizable to all use cases.

Ongoing Developments
The characterization of gravitational-wave detectors and mitigation of identified noise artifacts is an essential element of gravitational-wave data analysis. The non-idealized nature of ground-based gravitational-wave detectors ensures that careful studies of these detectors and their data will increase the overall sensitivity of the detector. Even in the best-case scenarios, the high rate of instrumental artifacts in the data means that noise mitigation procedures must be used to confidently detect gravitational waves and make statements about their source properties.
Alongside characterization of currently operating ground-based detectors, research is actively underway to understand how these methods will be a part of observations with next-generation detectors. Metrics that are being used to evaluate potential locations for these instruments include the impact of environmental sources on their overall sensitivity [168][169][170]. Noise sources that are currently sub-dominant, such as laser frequency noise [171] and correlated magnetic noise [172], are expected to become major limitations for future analyses. The vastly increased sensitivity of these detectors means that common assumptions used in detector characterization analyses, such as the lack of correlated noise between detectors, will no longer be valid. These differences mean that careful monitoring and subtraction of correlated [173] and uncorrelated noise [170,[174][175][176] will play an important role in next-generation analyses.
Detector characterization and noise mitigation efforts are expected to also be a component of analysis of data from space-based detectors. Methods to flag non-stationary noise in LISA [177], separate instrumental artifacts from astrophysical signals [178], and use gap-filling to address missing data [179,180] have already been proposed.
One of the most fundamental changes expected in future detector characterization and noise mitigation developments is an increased reliance on automation and machine learning [18]. The high rate of detection with future detectors will prevent the same level of human involvement in the vast majority of detected gravitational-wave events. Furthermore, the significantly increased size of ground-based detectors will make site-based detector investigations more challenging while space-based investigations may be impossible. However, the methods discussed in this work will lay the groundwork for how detector characterization and noise mitigation will be completed for decades to come.

Institutional Review Board Statement: Not applicable
Informed Consent Statement: Not applicable Data Availability Statement: Publicly available datasets were analyzed in this study. All gravitationalwave strain data used for visualizations in this work is from the Gravitational-wave Open Science Center (GWOSC) [23].

Abbreviations
The following abbreviations are used in this manuscript: