1. Introduction
The Next-Generation Radar (NEXRAD) network consists of 160+ S-band polarimetric Doppler weather radars (WSR-88D), deployed across the continental US, Alaska, Hawaii, and Puerto Rico. Each WSR-88D measures six variables comprising of three single polarization variables and three dual polarization variables. The single polarization variables are the radar reflectivity factor (
), which is proportional to the power of the received signal; Doppler velocity (
), which is determined from the power-weighted mean Doppler frequency shift of targets within the radar sampling volume; and spectrum width
, which measures the variability of Doppler velocities within the sampling volume [
1,
2,
3]. The dual polarization variables include differential reflectivity (
), the logarithmic ratio of the reflectivity factors from the Horizontal (H) and Vertical (V) polarizations, differential propagation phase shift (
), the difference in phase shift between H and V polarizations and cross correlation coefficient (
), which measures the diversity in type, shape and/or orientation of scatterers in the sampling volume [
1].
In addition to weather echoes, the WSR-88D can detect biological scatterers such as birds, bats, and insects [
4], opening potential applications for broad scale studies of their behavior. For example, large-scale radar monitoring can improve our understanding of the spread of avian diseases by allowing a detailed mapping of migratory flyways [
5,
6]. Additionally, bird strikes are a serious aviation hazard for low-level flights [
6]. NEXRAD could be used to detect and avoid migrating birds, thereby improving aviation safety [
6,
7]. Another application is the identification of wind tracers. Insects have been found to be mostly driven by the wind during flight while birds are active fliers and can contaminate the wind derived from radar. Previous research has been dedicated to retrieve velocities contaminated by birds, using the features of reflectivity and Doppler velocity fields [
8,
9,
10]. Correctly separating insect echoes from bird echoes can improve the quality of radar wind products.
Many advances have been made in characterizing hydrometeor types [
11,
12,
13]. However, the classification of biological echoes is still an active research field [
7,
14,
15,
16]. A major obstacle to classifying biological echoes is that the shapes of birds and insects are strongly non-spherical [
4]. Moreover, polarimetric measurements have a strong dependence on their size, shape, and orientation [
4,
17]. Thus, even in single-species ensembles, polarimetric quantities could have high variance depending on the azimuthal orientation [
18]. This sometimes leads to similar measurements for bird and insect echoes, making it difficult to differentiate them. For example, the
of Purple Martin colonies have been found to range from −4 to 6 dB [
19] while insects have been found to have
between 2 and 9 dB [
20].
Various methods have been explored to detect biological echoes with radars [
14,
15,
16,
21,
22,
23,
24,
25], though much less work has been done in distinguishing bird and insect radar echoes. Nonpolarimetric radar was used in [
26] to discriminate these echoes by measuring their radar cross sections within close ranges from radar. However, only two cases were examined with this approach. A fuzzy logic algorithm was also developed for separating birds and insects echoes in [
7]. However, the use of
as an input complicates the resolution of densely aggregated insects and sparse groups of large birds.
Machine learning models have been trained for detecting roosting birds, focused on identifying their distinct toroidal shape. Convolutional neural networks were used in [
27] to detect whether an individual radar image contained at least one Purple Martin or Tree Swallow roost, with correct predictions made about
of the time. Another machine learning system was developed in [
28] that locates roosts within images and tracks them across frames. Although these methods are useful, they are designed to detect one orientation of birds while using the entire radar image as an input. They cannot be applied to a single range gate and cannot be used in situations where birds are not roosting.
We propose a machine learning model that can classify diverse orientations of bird and insect echoes, by operating on individual radar range gates. Two supervised machine learning methods are investigated: ridge classifier and decision tree. Dual polarization radar scans containing separate large-scale bird and insect migration were collected (
Section 2). Next, the migrating bird (insect) echoes are segmented using blob coloring and then their textures were computed (
Section 3). Velocity azimuth display (VAD) is applied to change the measurement coordinates from being relative to the radar to be relative to the target and multiple bird (insect) dominated scans are averaged to reduce contamination by other echoes in
Section 3. The averaged scans are used to derive training inputs for the classifiers. The next sections summarize machine learning methods used (
Section 4) and the metrics for evaluation (
Section 5). Both machine learning methods are trained, first on only dual polarization variables and then on different combinations of the remaining features (
Section 6). Their performances are evaluated using metrics computed on test data (
Section 7). Further case studies (
Section 8) are conducted to analyze performance on new scenarios from different WSR-88D radars. Finally, our conclusions and recommendations are presented in
Section 9.
3. Feature Processing to Prepare Inputs
In this section, further processing is performed on the collected bird (insect) scans to prepare inputs for training and evaluating the classifiers. All scans in the data set are for highly aligned migration cases. First, the texture of each dual-pol variable is computed for each scan. Next, blob coloring and minor region removal are used to extract only range gates containing migrating birds (or insects), followed by VAD analysis to find their heading. Ideally, we would desire a bird migration scan to be purely comprised of bird echoes. However, they usually also contain few insect echoes. This is also the case for insect dominated scans. We propose a way of coherently averaging multiple scans along the target’s aspect to improve reduce this contamination.
3.1. Texture
Many image operations are performed on a local section defined by a window. Such windows are usually described with respect to a reference pixel, where the result of any computation is output. In our case the reference pixel is the middle one. Textures are the result of one of such operations, that characterizes the spatial variation of radar variables in the two-dimensional fields, i.e., azimuth and range directions [
13,
29]. We calculated texture using an 8-connected window, which a 3 × 3 grid of pixels with the reference at the center. Mathematically, at a given radar gate with range
and azimuth angle
, the mean absolute deviation of a variable
from its neighbor gates is calculated as
where
is the range gate offset,
is the azimuth offset, and
is the window size. Calculations were performed only when all the surrounding gates contained echoes.
3.2. Blob Coloring and Minor Region Removal to Extract Migration Echoes
Blob coloring is an image processing method used to identify connected groups of pixels with the same value [
30,
31]. It is applied to detect regions comprised of migrating echoes. We would define some relevant terms before describing the algorithm. All definitions are with respect to a binary image where a pixel either contains a target (pixel value 1) or background (pixel value 0). A region (or blob) is a group of contiguous pixels with the same value. Two types of windows were applied in this study. The first is the previously described 8-connected window. The second window type is the 4-connected, which refers to a reference pixel, and the neighbors directly above, below, left, and right. Another operation performed is dilation, which involves iterating a window over an image and setting the reference pixel at each step as the OR of all pixels within the window. The result is an expanded region of target pixels.
Data for bird and insect migration were collected for clear air days, which are characterized by a large area of biological echoes around the radar. An example for bird migration
in clear air is shown in
Figure 1a within a maximum range of approximately 150 km.
is only chosen for demonstration, it is not used at any other point of this study. The data matrix for this scan can be considered as an image
where rows correspond to ranges and columns correspond to azimuth angles. The blob coloring with minor region removal algorithm is implemented as follows. First, the radar image
is binarized by setting all gates containing echoes to 1 while the remaining gates are set to 0. The second step involves dilating the binarized image twice to connect regions with nearby isolated echoes. The dilated image
is given as
where
is the dilation operator and B is the 8-connected window. In the third step, a region labelling algorithm [
30] is applied to identify the different target regions in
. Next, minor region removal [
30] is applied, where the largest target region is retained and the remaining target regions are set to background pixels. Often, this major region contains few isolated holes of background pixels. These holes are plugged, by complementing the image, repeating the blob coloring with minor region removal algorithm, and re-complementing the image [
30]. The resulting mask
is a binary image with one solid target blob (shown in
Figure 1b) indicating the region containing migration echoes. The final image
(
Figure 1c) is obtained by the element wise multiplication of the map
and the original image
. This is expressed as
where
represents the multiplication operation. This image would contain the migrating birds. The same procedure is repeated for insect cases.
Figure 1d shows
for insects with a minor precipitation region west of the radar. The generated map excludes this minor region (
Figure 1e). The final extracted echoes would contain insects (
Figure 1f).
3.3. Reference with Respect to the Target’s Azimuth
Due to the non-spherical shape of biological targets, their radar returns would depend on the angle from which they are observed, hereafter referred to as their aspect. As such, methods for identifying biological echoes will have to account for this dependence. Cases of wide spread alignment can leverage traditional VAD [
2,
32,
33] or azimuthal patterns in the correlation coefficient [
18] to recover aspect information. We used VAD to rotate the variables, so they become a function of their aspect azimuth (
). First a sinusoid model is fit to
at every range,
where
is the fitted radial velocity,
is the radar’s azimuth,
is the magnitude of velocity along the migration direction,
is frequency, and
is a phase offset. It is assumed that the wind field is uniform at every height so
cycles/degree. The migration direction is defined as the direction toward which the targets are heading. It is obtained as the radar azimuth that maximizes
,
This direction captures measurements from the tail aspect. Scattering from other azimuthal aspects can be deduced by the lag from
as
such that a
of
represents the tail region of biota,
represents the left-wing region
represents the head region and so on.
An example for this procedure is shown in
Figure 2.
Figure 2a shows the VAD at range 70 km for one of the scans in the training set. The blue line is the filtered velocity obtained by applying a 10th order one dimensional median filter on
. The green line is the fitted
. Migration was found to be toward
. The radial velocity w.r.t to the target
, shown in
Figure 2b, is obtained by shifting
to the left by
. Migration would be toward
of
. This process is applied at every range ring to find the migration direction and rotate all dual polarization measurements and their textures. All measurements are now relative to the aspects of the targets.
3.4. Averaging Bird and Insect Cases
To reduce the contamination of our bird migration cases by insect echoes and vice versa, multiple scans are averaged. Following blob coloring and rotation of the collected scans and their textures, they are grouped into three batches. Let us call them batches A, B, and C. Each batch contains 15 randomly selected scans per class (a total of 30 scans per batch). The following discussion will be focused on A though all steps equally apply to B. Each scan will have different azimuths, so we created a new range and aspect azimuth grid both starting at 0 and with resolutions of m and respectively. All scans were interpolated to this common grid. The new 15 scans for birds (insects) are then averaged. In the last step, all range gates in the resulting averaged scans from A and B are combined to form the training set, containing 1,711,624 samples: bird and insect cases. Batch C is used as the test set. It is not averaged so that it represents the kind of measurements we expect from the WSR-88Ds. The test set contained 9,402,821 range gates with bird and insect cases.
Some visualizations of the averaged training cases are shown in
Figure 3. The blue curve is for birds and the red for insects. Each plot is for a dual polarization variable against the target aspect at specific ranges. From top to bottom, rows correspond to measurements 15, 30, and 45 km from the radar. From left to right, columns correspond to
,
, and
respectively. The averaging procedure shows that dual-pol variables have a strong dependence on
and exposes clear delineations between birds and insects. The results are also consistent with previous literature. Analysis in [
19] found that echoes attributed to birds (Purple Martins) had
between −4 and 6 dB. In our case, the averaged
(shown in
Figure 4a) for birds is generally low, between −2 and 4 dB. The highest value is around
(between the head and right wing) and the lowest around
(between the tail and left wing). Insects were found to have high
(up to 10 dB) in [
20]. Our averaged insect
is also generally higher with most gates between 3 and 6 dB. Interestingly, the values dip below the bird
values between
of
and
.
(shown in
Figure 3b) for birds is generally higher than insects, with peaks around
and
. This is consistent with the observed symmetry of
about the direction of migration [
20]. Insects have lower
values.
(
Figure 3c) for bird migration have been observed to have low values corresponding to tail-on viewing angles and high values for head-on angles [
18,
19]. This can be seen in the sinusoid-like pattern in
Figure 3c, with high values (around 0.7) between
and
and low values (around 0.4) otherwise. Insects generally have higher
than birds with a mean value around 0.7.
After the averaging procedure, both the training and test data sets are normalized. The mean and standard deviation for each variable was computed from the 60 scans in batches A and B. They are used to normalize each variable by mean centering and scaling by their standard deviation. This ensures that all variables are on the same scale. The same procedure was applied to normalize their textures.
3.5. Input Features for Classifiers
The normalized dual polarization variables and their textures are used as input features for the classifiers. Additionally, inspection of the data revealed that variables varied gradually with range and . Thus, two new discrete features were created to capture this variation. The first is range interval, which refers to 10 km bins. The second is sector, which refers to sectors computed from . All echoes collected in this study were from 10 to 230 km, so would contain 22 elements. contains 18 elements.
4. Machine Learning Methods
Our goal was to train an algorithm for distinguishing bird from insect echoes, that could be implemented operationally on NEXRAD. Traditionally, fuzzy logic has been used for classifying weather radar echoes. However, we opted for a supervised machine learning (ML) approach mainly because they predict probabilities for each range gate in addition to predicting output classes. They can also be easily updated as new data is available since they learn a model that minimizes prediction errors on the training data.
More complex neural networks have been successfully applied to detect [
27] and track roosting birds [
28]; however, they were not designed to make classification on a single radar range gate, and rather use a rendered image of a full radar scan as input. They are also trained to specifically detect birds emerging from their roosting sites. As such, these networks cannot be generalized to other patterns of bird activity or types of biological echoes. In this study, we investigate a supervised ML approach for distinguishing birds and insects, that can use inputs from a single range gate, is able to provide a probability that a range gate contains birds (or insects) and is easy to retrain as more data is collected. We explored using both the ridge classifier and decision tree.
The ridge classifier learns a linear combination of input variables that achieves the best separation between classes in the output. We used the SGDClassifier [
34] in scikit-learn. For a single range gate, the function is given as
where
w is the weight vector,
x is the input vector, and
is a bias term. The goal is to find parameters that minimizes the log loss error given by
where
is the label for each training example. A scaled L2-norm of the weights is also added to the above loss to stabilize learning by penalizing any explosion of the weights. The final loss function is given by
where
is the number of training examples and
controls the effect of the weight penalty. The learning process uses stochastic gradient descent [
35] on
and
, and a search on
to find values that minimize
.
Our second technique, decision trees, learns rules to recursively partition data so that samples with the same labels are grouped together. We used the DecisionTreeClassifier [
34] from scikit-learn. Within the context of decision trees, an attribute
is a question asked about the data e.g., is
? Answers to this question, like True or False, are called values
and are used to partition the data set. There are also two classes
containing
positive examples (birds) and
negative examples (insects). The entropy of an attribute measures its homogeneity. It is defined as
where
is proportion of the
th class. High entropy indicates a uniform distribution over classes while low entropy indicates the dominance of some classes. Information gain measures the reduction in entropy for a given split. It is defined as
where
and
are the number of positive and negative examples, respectively, in the
th split. In order words,
is the difference between the entropy before a split and the mean entropy after the split. Decision trees learn by finding attributes that maximizes
.
5. Metrics
We used four metrics to assess our classifiers. They are accuracy (ACC), true positive rate (TPR), true negative rate (TNR), and area under curve (AUC).
Table 1 below shows the confusion matrix for our classification problem. Birds are used as the positive class, so true positives (TP) are birds that are correctly classified as birds, false positives (FP) are insects classified as birds, false negatives (FN) are birds classified as insects and true negatives (TN) are correctly classified insects. Each instance corresponds to a range gate.
Accuracy is the proportion of the whole data set that is correctly predicted. TPR is the proportion of correct predictions only on bird cases. Similarly, TNR is the proportion of correct predictions for the insect cases. They are calculated as shown in the following equations:
Binary classifiers usually predict a probability (or score) for the positive class and then a threshold is applied to obtain the final class. The receiver operating characteristics (ROC) curve plots
against the false positive rate,
(
is
) for varying probability thresholds [
36]. The goal of the ROC curve is to find an intermediate threshold that maximizes TPR and minimizes FPR. The area under curve (AUC) metric summarizes the area under the ROC curve [
36]. Good classifiers should have an AUC close to
.
6. Model Training and Validation
Classifiers are sensitive to the class distribution of the training set. Thus, we applied class weights [
34] to balance the effect of each sample on the loss function. For each machine learning method, eight models are trained using different combinations of inputs. First, a base model is trained on only dual polarization variables and then different combinations of the remaining features are added to investigate their effect on performance. It should also be noted that not all inputs features can always be obtained from the radar scan. For example, sector is calculated using a sinusoid fit to the velocity of migration echoes. These echoes are mostly composed of a single species moving in a particular direction. In cases containing diverse species without a common heading, the sinusoid fit will not be possible, and sector would be unrecoverable. Velocity aliasing could also prevent the recovery of sector. In these cases however, the base model can always be used.
K fold cross validation [
37] was used to tune the model hyperparameters. In this method, the data set is divided into K folds. Model training is performed on K-2 folds, validation on one fold and testing on the last fold. Since we already held out a test set (batch C), training was performed on K-1 folds and validation on one fold. The whole process is repeated K times where each fold is used as training and validation once. A total of five folds were used. After cross validation, the hyperparameters that have the best performance are chosen for each model. Final training is performed using the selected hyperparameters and the full training set. An ROC curve is then generated and a critical threshold found, such that it maximizes TPR and TNR. This threshold would be used to convert predictions into classes. The training process is stochastic so each run produces slightly different results. To have a robust assessment of performance, 30 independent training runs are repeated for each model. All the trained models are then evaluated on the test data. Confidence intervals for each metric is calculated using the bootstrapping percentile method where each metric is computed from an iteratively chosen random sample of the test data [
27,
38]. We computed each metric for 100 iterations based on 1000 randomly chosen samples. The 100 metrics for 30 repeated runs forms a distribution with 3000 estimates. The confidence interval is found as the
and the
points of the distribution [
27,
38].
7. Performance
The
confidence interval for the model metrics are shown in
Table 2. All the ridge classifiers are predictive with
,
,
, and
. Sector is expected to greatly improve results; however, its addition to ridge classifiers cause marginal changes to performance. It slightly improves TNR, slightly reduces TPR and does not seem to have a noticeable effect on AUC. This could be because the training data was already averaged along the aspect angles creating a clearer delineation between both classes, so that classification can be effectively performed without sector. (Recall that sectors are
bins). Addition of range interval marginally changes performance, improving ACC, TPR, and AUC, and reducing TNR. The addition of texture generally improves the model metrics.
The decision tree models are also predictive for TPR, TNR, and AUC but perform significantly worse on TPR with some models having values around 0.5. This seems to coincide with models using range interval and/or sector. A possible cause could be that its binary decision making tends to prefer classifying whole range intervals (or sectors) as one class in contrast to ridge classifiers that only learns a probability adjustment. Using smaller range intervals and sectors might mitigate this problem. Like the ridge classifiers, incorporation of texture generally improves the model metrics. However, this might not generalize to non-migration cases. Recall that labels were provided based on the dominant migrating taxa. Textures have the effect of averaging measurements derived over a 3 × 3 neighborhood, so would emphasize the dominant class leading to better metrics for migration cases. However, for non-migration cases with a heterogeneous mix of scatterers, texture could lead to mis-classifications.
Both models were compared using an independent two sample t-test with a significance level of 0.05. The null hypothesis was that both metric distributions have the same mean. The ridge classifiers proved to be the better performing method with higher means on at least three metrics. Based on these results the ridge classifier was selected as a better method. All further discussions would be focused on this classifier. The best ridge classifier uses dual polarization variables, texture, sector, and range interval as inputs with . It is possible though, that the improvement caused by the added features could be the classifiers over-fitting to migration cases. Thus, additional studies on a diverse variety of cases (presented in the next section) are required to understand the effect of these features on performance.
9. Conclusions
NEXRAD’s detection of birds and insects offers much promise for a variety of applications. In this work, we developed a classifier for distinguishing bird and insect radar echoes based on dual polarization variables. Unique challenges were faced during data collection due to complex scattering off their non-spherical bodies. This was addressed by leveraging cases of large-scale single specie migration with a common heading to change measurement coordinates from being relative to the radar to be relative to the body aspect of biota. The mean flight direction, which would measure scattering off the tail, was found by VAD analysis and then measurements from other aspects are deduced by the lag off this direction. Another issue is the difficulty in labelling training data sets because of the frequent collocation of birds, insects and other non-biological echoes in the radar sampling volume. We addressed this by averaging 15 alignment calibrated bird (insect) migration scans to reduce the effect of the less dominant class.
The data preparation pipeline is summarized in the following steps. First, 45 scans containing mass migration in clear air were collected for each class. Blob coloring with minor region removal was applied to segment regions of migration echoes and their textures computed. Extracted migration echoes are then rotated to become relative to the target’s aspect. The rotated scans are grouped into three batches, each containing 15 scans per class. All 15 scans in two of the batches are averaged to reduced contamination. This is done for both classes. Gates from the four resulting averaged scans are used as training samples. The last unaveraged batch is used as the test set. All samples in both sets are grouped into 10 km range intervals and sectors of the target relative azimuth. The final candidate feature set was made up of the dual pol variables, their textures, and the range interval and sector bins.
Two machine learning methods were explored: ridge classifier and decision tree. Eight models were trained for each method, starting with a base model using only dual polarization variables and then adding other input features. Four metrics were used for evaluating the classifiers on the test data set. They are accuracy (ACC), true positive rate (TPR), true negative rate (TNR), and area under curve (AUC). A comparison of the metrics from both methods showed that the ridge classifiers performed better than decision trees in at least three out of four metrics. Based on this, the ridge classifiers were selected for classifying bird and insect radar echoes. All the ridge classifiers are predictive with , , and . The addition of other features improve these metrics by up to , however, later evidence suggests that this is probably due to over-fitting to cases of large-scale migration.
Further qualitative case studies were conducted to assess the effect of using different inputs to the classifier on a bird and insect migration scan from the test set. The ridge classifiers detected birds in
of range gates for the bird scan and insects in
of gates for the insect scan, consistent with our hypothesis of the source of these echoes. The addition of the remaining features to the base model has a bare effect on performance, with an increase of at most
in the proportion of birds detected and
for insect detections. This suggests that the additional input features might be superfluous. The classifiers were also evaluated on diverse cases of biological activity across NEXRAD. The training data was collected from KTLX, so the ability to detect biological patterns from other WSR-88Ds would provide strong evidence that it can be applied on the network. Furthermore, the training data did not contain bird roosts. Thus, the ability to detect roosting birds would be evidence in favor of the generality of the classifier. The next case explored bird roosts collected by the KHTX (located in Huntsville, AL, USA). Previous studies [
29] provided ground truth for bird roosts, insects, and weather echoes for this scan. Sector could not be recovered here because of the heterogenous mix of scatterers. The detections of the base classifier and the one with range interval match the provided ground truth. The addition of texture seems to degrade performance on the roosts, probably because it is less suited for capturing finer features. The classifier was also evaluated on a similar case from KLTX containing four bird roosts identified by observing the expanding ring over time. Again, the base classifier and the one with range interval detect all the roosts as birds, while the addition of texture degraded performance. Overall, the tests conducted show no evidence of improvements from adding features to the base classifier. For the sake of simplicity, the base ridge classifier is selected as the final model for our classification task.
Sometimes biological activity is a cue to underlying seismic events. In the next case, the base ridge classifier is tested on a splash of birds (observed by KTLX) fleeing their nests in response to an earthquake in Oklahoma. The classifier detects 86.8% of echoes to be from birds. This demonstrates the potential of using this classifier to study natural events of common interest to humans and aerial animals. For the fourth case study, the classifier was tested on a bird roost from KMOB where ground truth labels are known from previous research [
27,
39]. The base classifier detects the roost as predominantly birds. The final case study demonstrates the use of the classifier for large scale surveillance on NEXRAD. Here, swarms of insects were observed across the southern United States just before local sunset using six NEXRAD radars. The insects were identified in previous literature by their characteristic dumb-bell pattern in Z and ZDR, and low mean airspeeds in the lowest 1 km of airspace. The classifier detects these swarms as inspects.
In our test cases, the base ridge classifier has been demonstrated to correctly classify different orientations of biological echoes across NEXRAD. As such, we recommend this classifier could be implemented on the network, as a sub-classifier on the HCA’s biological class. The biggest challenge to developing biological classifiers is obtaining the ground truth. For future research, we hope to conduct more experiments to validate the source of these echoes. We also hope this research encourages other data collection and verification efforts.