A Deep Temporal Convolutional Neural Network for Regional and Teleseismic Detection

The detection of seismic events at regional and teleseismic distances is critical to Nuclear 1 Treaty Monitoring. Traditionally, detecting regional and teleseismic events has required the use of 2 an expensive multi-instrument seismic array; however in this work, we present DeepPick, a novel 3 seismic detection algorithm capable of array-like performance from a single trace. We achieve this 4 directly, by training our single-trace detector against labeled events from an array catalog, and by 5 utilizing a deep temporal convolutional neural network. The training data consists of all arrivals in 6 the International Seismological Centre Catalog for seven seismic arrays over a five year window from 7 1 Jan 2010 to 1 Jan 2015, yielding a total training set of 608,362 detections. The test set consists of 8 the same seven arrays over a one year window from 1 Jan 2015 to 1 Jan 2016. We report our results 9 by training the algorithm on six of the arrays and testing it on the seventh, so as to demonstrate the 10 transportability and generalization of the technique to new stations. Detection performance against 11 this test set is outstanding. Fixing a type-I error rate of 1%, the algorithm achieves an overall recall 12 rate of 73% on the 141,095 array beam picks in the test set, yielding 102,394 correct detections. This is 13 more than 4 times the 23,259 detections found in the analyst-reviewed single-trace catalogs over the 14 same period, and represents an 8dB improvement in detector sensitivity over current methods. These 15 results demonstrate the potential of our algorithm to significantly enhance the effectiveness of the 16 global treaty monitoring network. 17


Introduction
Adherence to the comprehensive nuclear test ban treaty is currently verified by the detection, location and identification of seismic events, often at regional (>500km) and teleseismic distances (>1000km).Seismic detection is the critical first step in this process, and it is imperative that the events be detected by multiple stations, as this increases the overall accuracy of the final location estimate.As such, maintaining a large network of highly-sensitive seismic detectors is key to the treaty monitoring community [1] [2].
Traditionally, sensitive teleseismic detection has required the use of a multi-instrument seismic array, a strategy which dates back to the Geneva Conference of Experts in 1958 [3].The sensitivity is achieved through beamforming [4], a spacial filtering technique that relies on a tuned network of interconnected seismometers which form a single station.This technique is extremely effective, however it is quite expensive to implement due to the additional sensors and processing required, and unfortunately, beamforming is inapplicable to single-instrument stations.As such, the vast majority of seismic stations around the globe are simply unable to detect weak regional and teleseismic events.
In this work, we seek to remedy this situation, by creating a detector with array-like performance from a single trace.Building on several recent efforts which apply the power of deep neural networks to the detection of local events [5] [6] [7], we seek to apply similar techniques to the detection of regional and teleseismic events, traditionally only detectable from a seismic array.Specifically, we seek to answer the following research question: Using the analyst reviewed catalog of events from an array beam as ground truth, what is the maximum recall we can achieve from a single-trace detector with an alpha of 0.01?
To answer this question, we present DeepPick, a single-trace detection algorithm capable of detecting 73% of the events in an array beam catalog.The algorithm is based on a deep Temporal Convolutional Neural Network (TCN), and it is trained against more than five billion raw seismic samples and 608,362 labeled seismic arrivals from seven array beam catalogs in the International Monitoring System (IMS) network: TXAR, PDAR, ILAR, BURAR, ABKAR, MKAR and ASAR located in Lajitas Texas, Pinedale Wyoming, Eielson Alaska, Bucovina Romania, Akbulak Kazakhstan, Makanchi Kazakhstan and Alice Springs Australia, respectively.Performance is reported by training the algorithm against five years of data from six of the arrays, and testing it against a full year of data from the seventh, remaining array.All seven arrays are tested in this manner, resulting in a overall recall of 72.6% at an alpha of 0.01.This represents a marked improvement over the 16.5% detection rate found in the traditional single-trace catalogs over the same time period.
Within this work, we present three major contributions to the literature: • We present our unique high-fidelity dataset, which combines single-trace waveforms with array catalog labels to create a seismic detection training set suitable for deep learning • We present exponential sequence tagging, the novel labeling schema we use to offset the extreme class imbalance inherent in the teleseismic detection task • We present DeepPick, a single-trace detection algorithm capable of achieving array-level performance from a single sensor In the remainder of this work, we explore these contributions in detail by first reviewing the related literature, then outlining our methodology, and finally detailing and discussing our results.The most common seismic signal detector is the short-term average, long-term average (STA/LTA) detector [8], first described by Allen in [9].This detector is a binary classifier, best suited for local events.The basic operation of this detector is detailed in Figure 1.This simple technique enjoys widespread use due to its extreme computational advantage, however its performance is reduced for weaker regional and teleseismic events [10].

Related Work
To date, one of the most successful techniques for regional and teleseismic signal detection is Beamforming [1] [11], introduced in 1988 [4].Beamforming gains its effectiveness by linearly combining signals from multiple sensors according to the estimated arrival direction, also known as the  Another outstanding technique for the detection of weak teleseismic events is the correlation detector first introduced by [12] and [13] in the early 1990s.Correlation Detectors are a type of Empirical Signal Detector, that work by comparing incoming seismic waveforms to canonical examples in the extant seismic record [14] [15].This technique is particularly effective for the detection of highly correlated repeating events, even for very weak magnitudes [16].Unfortunately, to date, this technique is not generally applicable, as only 18% of all global events possess sufficient similarity to be detected with this technique [17].
In [18], the authors demonstrate the power of a richly-featured machine learning based detector.
Training a Support Vector Machine against a series of 30 features in the time-frequency plane, they achieved a recall of 97.7% at a type-I error rate of less than 1.3%, for an overall accuracy of 98.2%.In [6], the researchers also utilize a deep CNN to perform seismic signal detection on local events.
Their dataset consisted of 4.5 million 4 second windows of waveform data recorded and classified by the Southern California Seismic Network.Their task was formulated as a classification problem, assigning one of three classes to each window, P-wave, S-wave and noise.This resulted in 1.5 million windows containing a P-wave arrival, 1.5 million windows containing an S-wave arrival and 1.5 million windows including no arrival.Their validation set consisted of a randomly sampled 25% of the overall data, resulting in 1.1 million seismograms evenly split between the three classes.On the validation set, they report a recall of 96% at a type-I error rate of less than 1%.These results are very impressive, and show that the convolutional neural network is capable of achieving state-of-the-art performance on the seismic signal detection task.Once again, a limitation of this work is that it is applicable only to local signals, and the researchers limited their scope to signals originating within 100km of the recording station.Additionally, due to the fact that only a quarter of a million events were considered, while 1.5 million records were used, it is unclear whether or not there was some leakage from the training set into the validation set.
In [19], the same research team as above considers arrival time estimation.Here they formulate the task as a regression problem, and consider only 4 second windows of data, centered around an arrival, with up to half a second of variance in the arrival time from the center of the window.For this task, they report a mean average error of less than .02seconds from the analyst recorded picks.Once again, these signals are limited to local events.
Seismic signal detection is an active area of research, with new, improved algorithms being developed capable of achieving near-perfect accuracy for local events.Despite this, little effort has been made to extend detection to regional and teleseismic events without the use of a seismic array.This is exactly the research objective our work shall address.

Materials and Methods
Our stated objective is to build a single-trace detection algorithm capable of detecting weak regional and teleseismic signals with array-like performance.We know that such detections are possible using a full seismic array and we have seen the potential for achieving such detections using a deep neural network.With this knowledge as our guide, our approach is to employ a deep TCN model, feed it a single-trace input sequence, and train it to produce an output sequence based on an array beam catalog.In this section, we explore this approach in detail, first defining our dataset, and then describing our modeling strategy.

High Fidelity Arrival Catalog
At first glance, obtaining a dataset for training a seismic detector would appear to be trivial, as analyst-reviewed arrival catalogs are freely available for millions of seismic events.Unfortunately, despite the rigorous review process and the extensive cross-referencing, each single-trace arrival catalog only contains picks for signals with sufficient strength to be conventionally detectable from within that trace.This is a significant limitation when the goal is to train a detector more sensitive than the conventional one.Fortunately, there are certain sensors for which we do have accurate cataloged arrival times for regional and teleseismic signals below the noise floor; namely, the nominal element (usually a broadband 3-channel instrument) of any regional seismic array.Using conventional methods, the nominal element alone is unable to make accurate detections for sub-noise floor events, however the array beam as a whole can make these detections very accurately [11], and the beam arrivals are conveniently aligned to the nominal sensor element of the array.Thus, by obtaining our singe-trace input data from the nominal sensor, and by obtaining our labeled arrivals times from the array beam, we can create a labeled single-trace dataset with signals buried below the noise floor.As an example, Figure 3 demonstrates the significant improvement in detector threshold provided by the Makanchi Array beam in eastern Kazakstan.This traditional formulation is convenient, as the classes can easily be balanced at training time and it is the common method employed in most recent works in the literature [6], [5], and [18].However, this method is not well adapted for the detection of regional and teleseismic signals.Teleseismic signals are characterized by long-period features with frequency components as low as 0.01 Hz [20], and the detection of these features necessitates windows that are several minutes in length; unfortunately, this resolution is far too coarse for classification, and often covers multiple arrivals in a single window.As such, there are two conflicting requirements for creating binary classification windows in a teleseismic detection dataset: • Input windows must contain many samples to capture long-period teleseismic features • Output labels must cover few samples to allow meaningful temporal resolution for the detection windows To resolve this conflict, we reformulate the task.Instead of performing binary classification on each window, we perform regression on each sample, which is known as sequence-to-sequence modeling [21].Quite simply, the training windows are no longer labeled with a single output Boolean, but instead with an entire output sequence of real-valued numbers; each sample in the input sequence is assigned a corresponding label in the output sequence.But what labels should we assign?A naive formulation is to simply assign a 'one' at each cataloged arrival time and assign a 'zero' everywhere else.This formulation is called sequence tagging [22], and it works well for relatively balanced classes [23].Unfortunately, binary sequence tagging does not work well for teleseismic detection, as it results in an extreme class imbalance of several orders of magnitude, which hinders learning.For this work, we instead present a novel formulation which we call exponential sequence tagging.This formulation simply builds output sequences that consist of an exponential function applied at each cataloged arrival time, as shown in Figure 4 (b).To be precise, the labels in the output sequence are nominally zero up until a cataloged arrival time, at which point they increase and decrease exponentially, according to the mirrored exponential decay function given in Eq. ( 1), where λ is the decay rate.
Because each leg of the mirrored exponential decay function is both monotonic and deterministic, the value at each non-zero label can be used to directly infer the precise arrival time.And because the algorithm learns to match these labels with its output, every non-zero sample in the output is effectively an arrival time estimation.With this in mind, we assign one additional computation to our algorithm at run-time: a cross-correlation of the predicted output sequence with the original exponential decay function.This filters the output and effectively aggregates the arrival time estimates for an even more precise arrival time pick.Figure 4 (c) and (d) shows an example of the predicted output, both before and after this cross-correlation is applied.

Training, Validation and Test Sets
Using this approach to build our training dataset, we obtained a catalog of all local, regional and near-teleseismic arrivals for the seven array beams during a five year period from 1 Jan 2010 to 1 Jan 2015.We generated this catalog through a web query of the International Seismological Centre (ISC) Bulletin for seismic arrivals which can be accessed here: http://www.isc.ac.uk/iscbulletin/search/ arrivals/.The corresponding waveforms were then windowed around each arrival (the windows were 6 minutes in total length, sampled at 40Hz for a total of 14400 samples per window), and the raw traces were pulled from the Incorporated Research Institutions for Seismology (IRIS) Database, for the vertical channel of the nominal seismometer for each array (PD31_BHZ, TX31_BHZ, IL31_BHZ, MK31_BHZ, ABK31_BHZ, BUR31_BHZ and AS31_BHZ).This was accomplished via a custom Python script based on ObsPy-1.1.0,and yielded a dataset of 608,362 picks, and a total training size of more than five billion samples.The only pre-processing applied to the raw data was a normalization, detrending and bandpass filtering between 0.02Hz and 10Hz.
From this training dataset, we selected one month of data from each array (1 Jan 2010 to 1 Feb 2010), as a validation set.This validation set was used to tune the models, with final model selection based on validation set performance.
To build our testing dataset, we also obtained a catalog of all local, regional and near-teleseismic arrivals for the seven array beams, in this case during a one year period from 1 Jan 2015 to 1 Jan 2016.
This test set is inclusive of 141,095 arrivals in the seven array beam catalogs and 23,259 arrivals in the seven single-trace catalogs.This test set data was not used to train or tune the models, only to report performance against each array.Additionally, to ensure that our reported performance figures are indicative of the expected performance against novel stations, we actually trained seven separate models, each on a different partition of six arrays and tested against the seventh, such that performance for all seven arrays is reported using a model that did not have access to any training data from that array, demonstrating the transportability of our algorithm.

Modeling
Now that we have defined our dataset, we turn to a precise description of our modeling methodology, detailing the model architecture, hyper-parameter search vectors, and evaluation metrics.• Dilated convolutions allow precise control over the receptive field The receptive field is of primary importance for time-series modeling, as it explicitly limits the learn-able feature periodicity at a given layer.As such, one of our key design parameters was to ensure adequate receptive field for our algorithm.The equation for calculating the receptive field for a given convolutional layer, l, and dilation rate, d is given in (2): Using this equation, we designed our network to have a receptive field of roughly 100 seconds, allowing it to learn long-period features down to 0.01 Hz.We achieved this in just 4 layers, as shown in Table 1.Another key design parameter was to ensure that the dilation rate in each layer remained less than the receptive field in the previous layer, thereby avoiding any gaps in coverage.Notice that this constraint is maintained even for our final layer with a dilation rate of 256, as the previous layer had a receptive field of 331.Our final model architecture is shown in Figure 5.This basic structure was presented with good results in [24] and proved a good fit for the picking task as well.As such, this basic structure was maintained throughout our formal hyper-parameter search.Fixing this basic architecture, we engage in a limited hyper-parameter search over two general vectors: the optimal shape for the exponential function, and the optimal capacity for the neural network.
Optimization over the decay rate of the exponential was varied across 3 choices, {0.015, 0.02, 0.04}, selected based on visual inspection.Optimization over model capacity was conducted across two parameters, number of stacks and number of filters.Each parameter was varied across 4 choices, {2, 5, 9, 12} and {5, 10, 15, 20} respectively, ranging from a minimal capacity network (2 stacks with 5 filters and only 3,517 parameters) to a high capacity network (12 stacks with 20 filters and 328,681 parameters).
Because these two parameters are highly interrelated, the search was conducted exhaustively, for a total of 16 models.The final hyper-parameter selections were based on validation loss curves.

Evaluation Criteria
Our stated research objective is to determine the maximum achievable recall of our single-trace detection algorithm against the array beam catalogs.Because recall is a classification metric, and because we have formulated our task as a regression problem, we now carefully proceed to define our methodology for calculating recall: First, we define our detection window to be 4 seconds, which is identical to the window length used in [19].Using this, we define the number of Total Positives to be the number of labeled arrivals in the dataset, and we define the number of Total Negatives to be the length of the dataset divided by 4 minus the number of Total Positives, which is a conservative estimate.We next define a predicted arrival to be any peak in the output sequence above a certain threshold, and using this definition, we further define a True Positive to be any predicted arrival within 2 seconds (plus or minus) of a labeled arrival, and a False Positive to be any predicted arrival not within 2 seconds of a labeled arrival.
Likewise we define a False Negative to be any labeled arrival not within 2 seconds of a predicted arrival, and a True Negative to be the Total Negatives minus False Negatives.From these definitions, standard equations are used (3) to calculate recall and alpha: Using these definitions, and treating the analyst-reviewed array beam catalogs as ground truth, we report performance in terms of both receiver operating characteristic (ROC) curves and recall.
When reporting recall, we use an alpha of 1%, as this is consistent with the results reported in [5], [6] and [18].Because our primary interest is toward weak-signal detections, we also report recall as a function of signal to noise ratio (SNR).To do so, we define SNR to be the log ratio between the short-term and long-term average power, as given in Eq. ( 4), with a short-term window consisting of 5 seconds after the arrival, a long-term window consisting of 40 seconds before the arrival, and a bandpass filter applied from 1.8 to 4.2 Hz.
Additionally, in order to asses the value of our algorithm over existing single-trace methods, we compare our performance directly against the analyst-reviewed single-trace catalogs, noting particularly the increase in detector sensitivity in terms of SNR.And finally, we report our performance for the arrival time estimation task, detailing our mean absolute error across all detected arrivals.

Results
In order to define a final model, we explored two hyper-parameter search vectors: exponential decay and model capacity.We varied the decay rate between 0.015 and 0.040, and the results are given in Table 2, which shows 0.020 to be the optimal rate, with optimal recall on the validation set.Evaluating our final model against the hold out test set, we report our results in Table 3.The the curves are quite tight, with most curves flattening out at an alpha of only 0.3%.In Appendix A, we further explore the performance of our algorithm by plotting several example waveforms for both correct detections and missed detections.Finally, we report the algorithm's performance for the arrival time estimation task 1 .Here, the algorithm achieves a mean average error of 0.61 seconds from the analyst picked arrival times, with a distribution detailed in Figure 9.This plot shows that while the most common histogram bin corresponds to an absolute error of less than 0.025 seconds, the weakest signals are frequently missed by more than a second.This error is high when compared to the accuracy of a dedicated arrival time estimation algorithm, however it should be noted that these estimates are obtained directly from the output of our detection algorithm.As such, the 0.61 seconds is excellent when compared to the multi-second classification windows employed by most detectors [5] [6], and is well within the tolerance of a dedicated arrival time algorithm such as that given in [19]. 1 We report arrival time error only against true positives, as arrival estimation is distinct from detection for most seismic picking algorithms.

Discussion
The results in Table 3 demonstrate that the Deep Pick algorithm is capable of achieving a recall of between 64 and 92% against the analyst-reviewed picks from seven array-beam catalogs.The low end of this range, at 64%, still represents a significant improvement over the performance of existing single trace algorithms.However, the spread in our results is quite large, and we now attempt to examine the underlying cause of this performance variance.
The two stations with the worst performance are ILAR and ASAR.Interestingly, these two stations also utilize a different sensor, the Guralp CMG-3TB, from the other five stations, which all use the Geotech KS54000.This shows the importance of training the algorithm on stations with the same instrument type as the stations for which the algorithm is intended to be deployed against operationally.
The two stations with the best results are ABKAR and BURAR.Interestingly, due to higher noise levels at these sites, the array catalogs for these two stations contain relatively fewer events with relatively larger magnitudes.This makes the detection of these events a simpler proposition, and the recall rates of 90% and 88% reflect this fact.The final three stations are PDAR, TXAR, and MKAR.These stations utilize a common instrumentation, share similar geology and have similar noise levels; as expected, they also share similar recall rates of 73%, 78% and 77% respectively.
These results show that the primary determinant of algorithm success lies in the degree of similarity between the training stations and the testing station.As such, when deploying this algorithm for operational use it is important to find suitable arrays to train on in order to maximize performance.
In any case, the algorithm shows decent performance even when trained across different geographical areas and sensor types.

Conclusion
Weak teleseismic event detection is normally only possible using an array of seismic instruments and sophisticated processing techniques.Even recent works in the literature make little attempt to extend single-trace detection algorithms beyond local events.This is primarily due to the lack of available training data, an issue which we address by mining the seismic catalogs in a unique way, building our catalog for an array beam while taking our event waveforms from a single array element.
With this training data at our disposal, we find that the combination of temporal convolutions and our unique exponential sequence tagging function forms a powerful tool for weak signal teleseismic detection.In fact, the Deep Pick algorithm is able to accurately detect four times the number of events in the single-trace catalogs in our hold-out test set with an alpha of just 1%.
The findings in this work represent an important step forward in the field of teleseismic detection, and demonstrate that accurate teleseismic event detection is possible from a single seismic instrument.

Figure 1 .
Figure 1.Top: Example seismic waveform, annotated to show the STA and LTA windows.Bottom: Diagram detailing the operation of the STA\LTA algorithm.

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 doi:10.20944/preprints201811.0612.v1back-azimuth, allowing it to pick out signals beneath the noise floor of a single sensor.Unfortunately, beamforming is also quite expensive, requiring an interconnected array of seismometers, spread out across a large geographical area.An example array layout is detailed in Figure 2, along with a demonstration of the beamforming technique.

Figure 2 .
Figure 2. Top: Layout of the 10 element Makanchi Seismic Array, MKAR, in eastern Kazakhstan.The dashed lines illustrate an incoming teleseismic wave with calculated back-azimuth, θ.Bottom: Seismic waveforms from an arriving teleseismic event.Beamforming aligns these waveforms via the back-azimuth and wavefront velocity, and then linearly combines them to yield a higher SNR, improving the detection threshold significantly.

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 doi:10.20944/preprints201811.0612.v13.1.Data Collection The success of any deep neural network algorithm lies largely in the careful collection and construction of the training data.In this subsection, we present a dataset suitable for training a deep seismic detection algorithm.In particular, we detail two of our major contributions: First, we describe a novel method for obtaining a high-fidelity dataset of single-trace waveforms with labeled arrival times below the noise floor.Second, we present exponential sequence tagging, the unique sequence-to-sequence modeling schema we used to offset the extreme class imbalance inherent in the teleseismic detection task.We conclude this subsection with the details of our finalized training, test and validation datasets.

Figure 3 .
Figure 3. Normalized histograms showing the SNR distributions of detected signals from two seismic arrival catalogs.Both catalogs contain detections for the exact same location, MK31, which is the nominal element of the MKAR seismic array.The MK31 catalog is based on a single-trace detection algorithm applied to the MK31 instrument alone, while the MKAR catalog is based on beam-formed picks from the entire 10-instrument array.The mean SNR detected by the array beam is 8 dB lower than that of the single-trace.This lower detection threshold results in nearly an order of magnitude more detections in the MKAR catalog compared to the MK31 catalog.

Figure 4 .
Figure 4. (a): Input Sequence containing two arrivals (b): Labeled output sequence using the exponential function.(c): Predicted output sequence from the model.(d): Cross-correlation of the predicted output sequence with the exponential function.

3. 2 . 1 .
Model Architecture Our model architecture is based on the Temporal Convolutional Network.TCNs are deep convolutional architectures characterized by layered stacks of dilated causal convolutions and residual connections [24].These characteristics offer several distinct advantages for a seismic detection algorithm, which we briefly summarize: • Residual connections allow the model to have high-capacity and stable training • Causal convolutions allow the model to make predictions on continuous streaming trace data Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 doi:10.20944/preprints201811.0612.v1

Figure 5 .
Figure 5.One stack of our chosen TCN architecture.

Fixing
the decay rate at 0.020, we next varied the overall capacity of the model by increasing both the number of residual stacks, s, and the number of 1D convolutional filters, f .The resultant training curves are given in Figure6which shows that model capacity is optimized with 12 stacks and 15 filters, as increasing capacity beyond this point appears to have marginal value.This yields a final model with 12 residual stacks as shown in Figure5, with 15 filters on each 1D convolution, for a total of 185,311 fully convolutional parameters.

Figure 6 .
Figure 6.Validation Loss Curves during training.Each curve is labeled according to two hyper-parameters, s: number of residual stacks, and f: number of filters.The number of training epochs for each model were based on early stopping with a patience of 10.Total training time was approximately 200 hrs on an Nvidia GTX 1080 Ti.
results of our algorithm here are ground-breaking.Across the seven arrays, the detector is able to correctly classify 72.57% of the 141,095 array beam picks, yielding 102,394 correct detections.This is more than 4 times the 23,259 detections found in the analyst-reviewed single-trace catalogs for the same period.The ROC curves shown in Figure7further illustrate the success of the algorithm.The elbow of Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 29 November 2018 doi:10.20944/preprints201811.0612.v1

Figure 7 .Figure 8 .
Figure 7. Receiver Operating Characteristic Curves for each of the seven arrays in the hold-out test set.A dashed line is shown in grey, indicating an alpha of 1%.

PreprintsFigure 9 .
Figure 9. Residual analysis on the errors for the arrival time estimation task.left: Histogram showing the distribution of arrival time errors made by the algorithm against the test set, with a bin width of 0.025 seconds.right: Scatter-plot showing the distribution of errors with respect to SNR.

Figure A. 1 .
Figure A.1.Three events missed by the DeepPick algorithm; all three events were included in both the single-trace and array-beam catalogs.(a) For this event, DeepPick did make a detection, however DeepPick's estimated arrival time was just outside the 2 second margin used by our classifier.(b) and (c) DeepPick's output was just below the detection threshold for an alpha of 1%.

Figure A. 2 .Figure A. 3 .
Figure A.2. Three events detected by the DeepPick algorithm; all three events were missing from the single-trace catalog, but included in the array-beam catalog.These examples represent the type of detections previously achievable only with a seismic array, but now possible using a deep single-trace algorithm.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 November 2018 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 29 November 2018 doi:10.20944/preprints201811.0612.v1 3
.1.2.Exponential Sequence Tagging Now that we have established high-fidelity sources for both our waveforms and arrival times, we must formulate them into input/output pairs for training our seismic detector.Typically, seismic detection is formulated as a binary classification task; the input data is partitioned into fixed length windows, each paired with a single Boolean class label: positive class labels are assigned to windows where a signal is present and negative class labels are assigned to windows where signal is absent.

Table 1 .
Layer Parameters for our TCN architecture.

Table 3 .
Algorithm Performance by Station.