An Asynchronous Real-Time Corner Extraction and Tracking Algorithm for Event Camera

Event cameras have many advantages over conventional frame-based cameras, such as high temporal resolution, low latency and high dynamic range. However, state-of-the-art event- based algorithms either require too much computation time or have poor accuracy performance. In this paper, we propose an asynchronous real-time corner extraction and tracking algorithm for an event camera. Our primary motivation focuses on enhancing the accuracy of corner detection and tracking while ensuring computational efficiency. Firstly, according to the polarities of the events, a simple yet effective filter is applied to construct two restrictive Surface of Active Events (SAEs), named as RSAE+ and RSAE−, which can accurately represent high contrast patterns; meanwhile it filters noises and redundant events. Afterwards, a new coarse-to-fine corner extractor is proposed to extract corner events efficiently and accurately. Finally, a space, time and velocity direction constrained data association method is presented to realize corner event tracking, and we associate a new arriving corner event with the latest active corner that satisfies the velocity direction constraint in its neighborhood. The experiments are run on a standard event camera dataset, and the experimental results indicate that our method achieves excellent corner detection and tracking performance. Moreover, the proposed method can process more than 4.5 million events per second, showing promising potential in real-time computer vision applications.


Introduction
In recent years, owing to the development of computer vision technology and the enhancement of computer information processing capability, conventional frame-based cameras show important application value in several fields such as unmanned robot system, intelligent security and virtual reality. While widely adopted, frame-based cameras are not always optimal in many circumstances. For instance, when up against high velocity motions (easily causing blur) or high dynamic range scenes, frame-based cameras can hardly show robust performance. Furthermore, in a time period without motion, the images captured by frame-based cameras contain the same redundant information, which leads to a huge waste of computing resources.
As a new type of vision sensor, event cameras [1,2] are inspired by biology and operate in a very different way from frame-based cameras. Instead of capturing images at a fixed rate, event cameras respond to pixel brightness changes, called "events", as they occur and output the time, locations and signs of the events asynchronously. Compared with framebased cameras, event cameras show outstanding properties: very high temporal resolution (not suffering from motion blur) and low latency (both in the order of microseconds), very high dynamic range (140 dB vs. 60 dB of frame-based cameras) and low power consumption. In addition, event cameras only consider the information from "events", thereby eliminating information redundancy and reducing computation. Event cameras show great potentials to overcome the challenges of high speed and dynamic range faced • Corner events are extracted by a new coarse-to-fine corner extractor. The coarse extractor with high calculation efficiency is used to extract corner candidates, and the fine extractor-based on a box filter only processes corner candidates. It significantly decreases the calculating amount and improves the efficiency without reducing the accuracy. • A space, time and velocity direction constrained data association method is presented to realize corner event tracking, and we associate a new arriving corner event with the latest active corner that satisfies the velocity direction constraint in its neighborhood. The remainder of this paper is organized as follows. The proposed method is detailed in Section 2. In Section 3, we perform our method on a publicly available event camera dataset and present the experimental results. Finally, we draw some conclusions and shed light on future work in Section 4.

Materials and Methods
In this paper, we propose an asynchronous corner extraction and tracking algorithm for an event camera. The main work of this paper has the following. Firstly, a simple yet effective filter is applied to filter out noise and redundant events. Afterwards, we extract corner events by a new coarse-to-fine corner extractor. Finally, a space, time and velocity direction constrained data association is presented to realize corner events tracking. The proposed method is detailed in the following subsections.

Filter of SAE
Event cameras operate in very differently from frame-based cameras; they only respond to pixel brightness changes, called "events", and output the time, locations and signs of the events asynchronously. As an implementation of the send-on-delta transmission scheme [29], an event e = (x, y, t, pol) is generated at the pixel location (x, y) at time t if the absolute difference of logarithmic brightness I(x, y, t) reaches a threshold K, Formally --= ( , , ) ( , , ) I x y t I x y t t pol K   where t−∆t denotes the time when the last event was generated at the pixel location (x, y), pol denotes the polarity of the event (the sign of the logarithmic brightness change that is either 1 or −1). Normally, when an event is generated, it will be appended asynchronously to the event streams. Since there is no concept of image frames for event cameras, a notion of SAE is proposed in [30], which is used to store the timestamp of the most recent event at each pixel. When an event e = (x, y, t, pol) arrives, the value of SAE at (x, y) is updated as SAE(x, y)←t. Because of noises and hardware limitations, a sudden brightness change will generate several events at the same pixel almost instantly, so the latest timestamps stored in the SAE are not the exact time when the stimulus signals are generated. This will reduce the accuracy of corner event extraction and tracking. In order to solve this problem, a simple yet effective filter [26] is applied to construct a more restrictive SAE, named as RSAE. In the RSAE, events generated within a small time window k (typically 50 ms) at The remainder of this paper is organized as follows. The proposed method is detailed in Section 2. In Section 3, we perform our method on a publicly available event camera dataset and present the experimental results. Finally, we draw some conclusions and shed light on future work in Section 4.

Materials and Methods
In this paper, we propose an asynchronous corner extraction and tracking algorithm for an event camera. The main work of this paper has the following. Firstly, a simple yet effective filter is applied to filter out noise and redundant events. Afterwards, we extract corner events by a new coarse-to-fine corner extractor. Finally, a space, time and velocity direction constrained data association is presented to realize corner events tracking. The proposed method is detailed in the following subsections.

Filter of SAE
Event cameras operate in very differently from frame-based cameras; they only respond to pixel brightness changes, called "events", and output the time, locations and signs of the events asynchronously. As an implementation of the send-on-delta transmission scheme [29], an event e = (x, y, t, pol) is generated at the pixel location (x, y) at time t if the absolute difference of logarithmic brightness I(x, y, t) reaches a threshold K, Formally where t−∆t denotes the time when the last event was generated at the pixel location (x, y), pol denotes the polarity of the event (the sign of the logarithmic brightness change that is either 1 or −1). Normally, when an event is generated, it will be appended asynchronously to the event streams. Since there is no concept of image frames for event cameras, a notion of SAE is proposed in [30], which is used to store the timestamp of the most recent event at each pixel. When an event e = (x, y, t, pol) arrives, the value of SAE at (x, y) is updated as SAE(x, y)←t. Because of noises and hardware limitations, a sudden brightness change will generate several events at the same pixel almost instantly, so the latest timestamps stored in the SAE are not the exact time when the stimulus signals are generated. This will reduce the accuracy of corner event extraction and tracking. In order to solve this problem, a simple yet effective filter [26] is applied to construct a more restrictive SAE, named as RSAE. In the RSAE, events generated within a small time window k (typically 50 ms) at the same pixel will be ignored and not be used for updates. More precisely, when an event e = (x, y, t, pol) arrives, if t-SAE(x, y) > k, or if the polarity of the latest event at (x, y) differs from e, the value of RSAE at (x, y) is updated as RSAE(x, y)←t. Owing to the special operating principle, in the same scene, different camera motions might generate different event streams. As shown in Figure 2, the brightness of the edge pixels will change with a moving camera, and different moving directions might generate different events, especially reflecting in the polarities of events. For this reason, according to the polarities of the events, we construct two more precise RSAEs, named as RSAE+ (see Figure 3a) and RSAE− (see Figure 3b) to replace the original RSAE. The above operations have the following two main benefits. Firstly, high contrast patterns (that is, edges) can be accurately represented in both time and space when brightness changes. Secondly, the noises and redundant events can be effectively filtered, which saves considerable calculation time.  Figure 2, the brightness of the edge pixels will change with a movin camera, and different moving directions might generate different events, especially r flecting in the polarities of events. For this reason, according to the polarities of the even we construct two more precise RSAEs, named as RSAE+ (see Figure 3a) and RSAE− (s Figure 3b) to replace the original RSAE. The above operations have the following tw main benefits. Firstly, high contrast patterns (that is, edges) can be accurately represent in both time and space when brightness changes. Secondly, the noises and redunda events can be effectively filtered, which saves considerable calculation time.

Corner Event Extraction
Among the existing corner event extractors, eHarris [23] provides satisfactory resu but has poor computational efficiency due to the use of convolutions. Conversely, eFAS [24] has high computational efficiency, but the performance is not as effective as eHarr FA-Harris [28] adopts a coarse-to-fine extraction strategy; in this algorithm, corner cand dates are first selected by an improved eFAST detector and then refined by an improv eHarris detector. Although the accuracy has been effectively improved, this algorith consumes a large amount of computation time and hardly runs in real-time due to t tedious eHarris-based method. For this reason, in this paper, a new coarse-to-fine corn event extraction method is proposed. We first adopt Arc* [26] to extract corner candidat the same pixel will be ignored and not be used for updates. More precisely, when an even e = (x, y, t, pol) arrives, if t-SAE(x, y) > k, or if the polarity of the latest event at (x, y) differ from e, the value of RSAE at (x, y) is updated as RSAE(x, y)←t. Owing to the special oper ating principle, in the same scene, different camera motions might generate different even streams. As shown in Figure 2, the brightness of the edge pixels will change with a moving camera, and different moving directions might generate different events, especially re flecting in the polarities of events. For this reason, according to the polarities of the events we construct two more precise RSAEs, named as RSAE+ (see Figure 3a) and RSAE− (see Figure 3b) to replace the original RSAE. The above operations have the following two main benefits. Firstly, high contrast patterns (that is, edges) can be accurately represented in both time and space when brightness changes. Secondly, the noises and redundan events can be effectively filtered, which saves considerable calculation time.

Corner Event Extraction
Among the existing corner event extractors, eHarris [23] provides satisfactory results but has poor computational efficiency due to the use of convolutions. Conversely, eFAST [24] has high computational efficiency, but the performance is not as effective as eHarris FA-Harris [28] adopts a coarse-to-fine extraction strategy; in this algorithm, corner candi dates are first selected by an improved eFAST detector and then refined by an improved eHarris detector. Although the accuracy has been effectively improved, this algorithm consumes a large amount of computation time and hardly runs in real-time due to the tedious eHarris-based method. For this reason, in this paper, a new coarse-to-fine corne event extraction method is proposed. We first adopt Arc* [26] to extract corner candidates

Corner Event Extraction
Among the existing corner event extractors, eHarris [23] provides satisfactory results but has poor computational efficiency due to the use of convolutions. Conversely, eFAST [24] has high computational efficiency, but the performance is not as effective as eHarris. FA-Harris [28] adopts a coarse-to-fine extraction strategy; in this algorithm, corner candidates are first selected by an improved eFAST detector and then refined by an improved eHarris detector. Although the accuracy has been effectively improved, this algorithm consumes a large amount of computation time and hardly runs in real-time due to the tedious eHarris-based method. For this reason, in this paper, a new coarse-to-fine corner event extraction method is proposed. We first adopt Arc* [26] to extract corner candidates from event streams and then develop a box filter-based extractor to refine the corner candidates. Compared with the FA-Harris detector, our method significantly improves the efficiency without reducing the accuracy.

Coarse Corner Extraction
When an event e = (x, y, t, pol) arrives, we first update RSAE* (RSAE+ or RSAE−) according to the polarity of e, by using the method in Section 2.1. Then Arc* [26] is adapted to extract corner candidates on the corresponding RSAE*. As shown in Figure 4, a moving corner pattern can generate a local RSAE* with two markedly distinct regions, and two different moving directions will create entirely different local RSAEs for the same corner pattern. Therefore, the corner candidates are extracted by searching for a continuous region of the local RSAE* with higher value than all other elements. More specifically, a 9 × 9 pixel-sized patch around e is selected on the RSAE*. For convenience, we only consider the pixels on two centered concentric circles with radius 3 and 4. For each circle, we search for a continuous arc with higher timestamps than all other pixels on the circle. On the inner circle (blue), the arc length l inner should be within the interval of [3,6], and on the outer circle (yellow), the arc length l outer should be within the interval of [4,8] (see Figure 5a). Alternatively, the arc length on the inner and outer circle should be within the interval of [10,13] and [12,16], respectively (see Figure 5b). In either case, if such an arc can be searched on both circles, the event is considered to be a corner candidate.
Sensors 2021, 21, x FOR PEER REVIEW 5 of from event streams and then develop a box filter-based extractor to refine the corner ca didates. Compared with the FA-Harris detector, our method significantly improves t efficiency without reducing the accuracy.

Coarse Corner Extraction
When an event e = (x, y, t, pol) arrives, we first update RSAE* (RSAE+ or RSAE according to the polarity of e, by using the method in Section 2.1. Then Arc* [26] is adapt to extract corner candidates on the corresponding RSAE*. As shown in Figure 4, a movi corner pattern can generate a local RSAE* with two markedly distinct regions, and tw different moving directions will create entirely different local RSAEs for the same corn pattern. Therefore, the corner candidates are extracted by searching for a continuous gion of the local RSAE* with higher value than all other elements. More specifically, a 9 pixel-sized patch around e is selected on the RSAE*. For convenience, we only consid the pixels on two centered concentric circles with radius 3 and 4. For each circle, we sear for a continuous arc with higher timestamps than all other pixels on the circle. On t inner circle (blue), the arc length linner should be within the interval of [3,6], and on t outer circle (yellow), the arc length louter should be within the interval of [4,8] (see Figu 5a). Alternatively, the arc length on the inner and outer circle should be within the interv of [10,13] and [12,16], respectively (see Figure 5b). In either case, if such an arc can searched on both circles, the event is considered to be a corner candidate.   Sensors 2021, 21, x FOR PEER REVIEW 5 from event streams and then develop a box filter-based extractor to refine the corne didates. Compared with the FA-Harris detector, our method significantly improve efficiency without reducing the accuracy.

Coarse Corner Extraction
When an event e = (x, y, t, pol) arrives, we first update RSAE* (RSAE+ or RS according to the polarity of e, by using the method in Section 2.1. Then Arc* [26] is ad to extract corner candidates on the corresponding RSAE*. As shown in Figure 4, a m corner pattern can generate a local RSAE* with two markedly distinct regions, and different moving directions will create entirely different local RSAEs for the same c pattern. Therefore, the corner candidates are extracted by searching for a continuo gion of the local RSAE* with higher value than all other elements. More specifically 9 pixel-sized patch around e is selected on the RSAE*. For convenience, we only con the pixels on two centered concentric circles with radius 3 and 4. For each circle, we s for a continuous arc with higher timestamps than all other pixels on the circle. O inner circle (blue), the arc length linner should be within the interval of [3,6], and o outer circle (yellow), the arc length louter should be within the interval of [4,8] (see F 5a). Alternatively, the arc length on the inner and outer circle should be within the int of [10,13] and [12,16], respectively (see Figure 5b). In either case, if such an arc c searched on both circles, the event is considered to be a corner candidate.

Fine Corner Extraction
The method in Section 2.2.1 can efficiently extract corner candidates and remove invalid events. In order to further enhance the accuracy, a box filter-based method is developed to refine the corner candidates. For a corner candidate, the corresponding local RSAE* is used to construct a 9 × 9 pixel-sized local binary patch T. We search for the largest n numbers (the latest n events) in the local RSAE* and set the corresponding elements in patch T to 1; meanwhile, the rest of the elements in patch T are set to 0. As shown in Figure 5, the total number of elements in the 9 × 9 pixel-sized local patch is 81, and the total number of elements on the inner circle (blue) is 16. Therefore, in our work we set where l inner is the arc length on the inner circle, which we have detailed in Section 2.2.1 and round(·) denotes the operation of rounding to the nearest integer. Similar to eHarris [23], we construct a 2 × 2 sized Hessian matrix H as follows: where where T(x, y) is the element value of the patch T at the pixel location (x, y), T x (x, y) and T y (x, y) are the gradients of T(x, y) in the x and y directions, respectively, w(x, y) is a Gaussian convolution kernel with a standard deviation of 1.2. L xx (x, y),L yy (x, y) and L xy (x, y) are Gaussian second-order differential templates in the x-x, y-y and x-y directions, respectively, and ⊗ denotes the operation of convolution operation. For each corner candidate, a score R can be calculated as then we compare the score R with a threshold to determine whether this candidate is a corner event.
In practical applications, it is a complicated and time-consuming process to calculate the Hessian Matrix H through the above formulas. In order to improve the efficiency of the algorithm, we introduce box filter templates [31] to approximate Gaussian second-order differential templates. The box filter templates and the corresponding Gaussian secondorder differential templates are shown in Figure 6. The elements in Gaussian second-order differential templates have different values, however, the box filter templates are only composed of several rectangular regions, and each rectangular region is filled with the same value. Therefore, elements in the Hessian matrix H can be approximated as where D xx (x, y),D xy (x, y) and D yy (x, y) are the box filter templates corresponding to the templates in Figure 6d,e,f, respectively. The box filter templates can transform a convolution operation into the simple addition of values between different rectangular regions. Com-   Figure 6d,e,f, respectively. The box filter templates can transform a conv tion operation into the simple addition of values between different rectangular regi Compared with the FA-Harris detector, our method greatly reduces the computa amount and improves computational efficiency.

Corner Event Tracking
When corner events are extracted asynchronously, how to establish data associat between these corners is a big challenge. Alzugaray and Chli [26] associate the cur corner with the latest active corner in its neighborhood and construct a tree-like struc to represent the trajectory for the same corner in spatio-temporal space. However tracking performance of this method is not ideal due to lack of effective restriction order to enhance the tracking accuracy, we improve on Alzugaray's method and prop a space, time and velocity direction constrained corner event tracking method in this per.
When the above event e = (x, y, t, pol) is determined as a corner, as shown in Fig  7, we define the corresponding local RSAE* as a function Se(•) which projects the posi p(x, y) to time t where time t is an increasing quantity. The first partial derivative of Se(•) can be writte Figure 6. The 9 × 9 pixel-sized Gaussian second-order differential templates and the corresponding box filter templates. (a-c) are Gaussian second-order differential templates in the x-x, y-y and x-y directions, respectively, and (d−f) are the box filter templates corresponding to (a−c), respectively. In these templates, the elements are within the interval of [-2, 1], and dark colors correspond to small values, whereas light colors correspond to large values. The elements in Gaussian secondorder differential templates have different values; for the sake of convenience, we have not labeled the exact values. The box filter templates are only composed of several rectangular regions, and each rectangular region is filled with the same value. We have labeled the exact value for each rectangular region.

Corner Event Tracking
When corner events are extracted asynchronously, how to establish data associations between these corners is a big challenge. Alzugaray and Chli [26] associate the current corner with the latest active corner in its neighborhood and construct a tree-like structure to represent the trajectory for the same corner in spatio-temporal space. However, the tracking performance of this method is not ideal due to lack of effective restrictions. In order to enhance the tracking accuracy, we improve on Alzugaray's method and propose a space, time and velocity direction constrained corner event tracking method in this paper.
When the above event e = (x, y, t, pol) is determined as a corner, as shown in Figure 7, we define the corresponding local RSAE* as a function S e (·) which projects the position p(x, y) to time t S e (p) = t where time t is an increasing quantity. The first partial derivative of S e (·) can be written as ∂S e ∂y (x 0 , y) = dS e | x 0 dy (y) = 1 v y (x 0 , y) where S e | x 0 and S e y 0 denote S e (·) restricted to x = x 0 and y = y 0, respectively. The gradient of S e (·) can be written as the gradient vector ∇S e measures the change of time versus position, and it is also the inverse of the velocity vector. In order to estimate the gradient vector ∇S e robustly, we assume that the local velocity vector is constant over a very small time on the local RSAE*. This is also equal to the assumption that the elements in the local RSAE* with higher timestamps (yellow in Figure 7) is a local plane, because the first partial derivative of S e (·) is the inverse of the velocity vector, and a constant local velocity vector produces a constant change in S e (·). Assuming the corner event e = (x, y, t, pol) belongs to a local plane with parameters N = (a, b, c, d) T , it satisfies As shown in Figure 8, we associate a new corner event e = (x, y, t, pol) with the latest active corner that satisfies the velocity direction constraint in its neighborhood. The neighborhood is defined as a region with a maximum range of dcon pixels centered around e; in our work we set dcon = 5. We also introduce a time constraint, so that corners more than tcon seconds later than e in the neighborhood will not be considered, which can effectively prevent data association with older corners. In our work we set tcon = 0.1. In addition to the time and space constraints, we add a velocity direction constraint for accurate tracking. More precisely, for a new arriving corner event e, we firstly search for the corners that satisfy the above time and space constraints in its neighborhood, then sort them in chronological order and store them in a container Cneigh. For each corner event ʹ ( ʹ, ʹ, ʹ, ʹ) x y t pol  e in Cneigh, the connection vector c' between e and e' can be calculated as ʹ ( ʹ, ʹ) T x x y y    c . Suppose we denote the velocity vector of e' as v', the angle between v' with c' can be calculated as (18) in our work, e' is considered to satisfy the velocity vector constraint if 5

  degrees. Then
we associate e with the latest active corner event that satisfies the above constraint in Cneigh and realize an accurate tracking of corner events. In addition, the elements in the local RSAE* with higher timestamps also satisfy Equation (15), and the parameters N = (a, b, c, d) T can be solved by a nonlinear least squares problem as then the velocity vector can be calculated by As shown in Figure 8, we associate a new corner event e = (x, y, t, pol) with the latest active corner that satisfies the velocity direction constraint in its neighborhood. The neighborhood is defined as a region with a maximum range of d con pixels centered around e; in our work we set d con = 5. We also introduce a time constraint, so that corners more than t con seconds later than e in the neighborhood will not be considered, which can effectively prevent data association with older corners. In our work we set t con = 0.1. In addition to the time and space constraints, we add a velocity direction constraint for accurate tracking. More precisely, for a new arriving corner event e, we firstly search for the corners that satisfy the above time and space constraints in its neighborhood, then sort them in chronological order and store them in a container C neigh . For each corner event e = (x , y , t , pol ) in C neigh , the connection vector c' between e and e' can be calculated as c = (x − x , y − y ) T . Suppose we denote the velocity vector of e' as v', the angle between v' with c' can be calculated as θ = arccos v · c |v ||c | (18) in our work, e' is considered to satisfy the velocity vector constraint if θ < 5 degrees. Then we associate e with the latest active corner event that satisfies the above constraint in C neigh and realize an accurate tracking of corner events.
borhood is defined as a region with a maximum range of dcon pixels centered around e; in our work we set dcon = 5. We also introduce a time constraint, so that corners more than tcon seconds later than e in the neighborhood will not be considered, which can effectively prevent data association with older corners. In our work we set tcon = 0.1. In addition to the time and space constraints, we add a velocity direction constraint for accurate tracking. More precisely, for a new arriving corner event e, we firstly search for the corners that satisfy the above time and space constraints in its neighborhood, then sort them in chronological order and store them in a container Cneigh. For each corner event ʹ ( ʹ, ʹ, ʹ, ʹ) x y t pol  e in Cneigh, the connection vector c' between e and e' can be calculated as ʹ ( ʹ, ʹ) T x x y y    c . Suppose we denote the velocity vector of e' as v', the angle between v' with c' can be calculated as (18) in our work, e' is considered to satisfy the velocity vector constraint if 5   degrees. Then we associate e with the latest active corner event that satisfies the above constraint in Cneigh and realize an accurate tracking of corner events.

Results
The proposed algorithm is performed on the publicly available event camera dataset [32]. This dataset is recorded by a DAVIS240C, which captures both events and intensity images with the resolution of 240 × 180 pixels. Our algorithm only processes the event streams, and the intensity images are adopted to collect ground truth. Because of space limitations, we select four representative scenes (shapes, dynamic, poster and boxes) with increasing complexity and event rate. Note that in order to ensure the fairness of the experiments, the selected scenes used in our experiments are same as those in Alzugaray and Chli [26]. We compare our algorithm with other advanced algorithms in term of extraction and tracking accuracy, and computational efficiency. In addition, the experimental results are analyzed. The following section details the ground truth collection method and experiment results.

Ground Truth
Since there is no data directly reflecting the actual behavior of the DAVIS240C in this dataset, how to collect ground truth for evaluating the performance is one of the difficulties in our work. In references [21,23], corner events are manually determined and labelled in event streams, however, these methods are only applicable to clear corners in simple scenes due to tedious manual work. In Mueggler et al. [24], an automatic determining and labelling method is proposed to calculate the spatiotemporal coordinates of the corner events, however, this method suffers from tremendous missed detections. Similar to the method in Alzugaray and Chli [26], the available intensity images are used to collect ground truth in this paper, and we use original Harris [22] to extract corners and track the corners by KLT [33]. In order to match the temporal resolution of the corner events, we adopt a cubic splines method to interpolate corresponding coordinates in the image plane. In our accuracy metrics, we only concern the events which are not filtered in Section 2.1. As shown in Figure 9, corners only generated within 5 pixel-sized neighborhood of the KLT-based tracking are concerned in our metrics. Since the selected dataset is recorded with increasing velocity, we just select the data of the first 10 s, which can reduce the influence on ground truth caused by high speed motion blur.
in event streams, however, these methods are only applicable to clear corners in simple scenes due to tedious manual work. In Mueggler et al. [24], an automatic determining and labelling method is proposed to calculate the spatiotemporal coordinates of the corner events, however, this method suffers from tremendous missed detections. Similar to the method in Alzugaray and Chli [26], the available intensity images are used to collect ground truth in this paper, and we use original Harris [22] to extract corners and track the corners by KLT [33]. In order to match the temporal resolution of the corner events, we adopt a cubic splines method to interpolate corresponding coordinates in the image plane. In our accuracy metrics, we only concern the events which are not filtered in Section 2.1. As shown in Figure 9, corners only generated within 5 pixel-sized neighborhood of the KLT-based tracking are concerned in our metrics. Since the selected dataset is recorded with increasing velocity, we just select the data of the first 10 s, which can reduce the influence on ground truth caused by high speed motion blur. Figure 9. Schematic diagram of the ground truth collection method. The green corners are extracted by the original Harris method and tracked by KLT on intensity images. The solid green line is the trajectory of one of the corners. The 5 pixel-sized neighborhood of the trajectory is enlarged and represented as an oblique cylinder. In our metrics, we only concern the events within 5 pixel-sized neighborhood of the KLT-based tracking (events in the oblique cylinder). Figure 9. Schematic diagram of the ground truth collection method. The green corners are extracted by the original Harris method and tracked by KLT on intensity images. The solid green line is the trajectory of one of the corners. The 5 pixel-sized neighborhood of the trajectory is enlarged and represented as an oblique cylinder. In our metrics, we only concern the events within 5 pixel-sized neighborhood of the KLT-based tracking (events in the oblique cylinder).

Corner Extraction Performance
Illustrative examples of corner event extraction by our method are shown in Figure 10. For each scene, we synthesize all corner events within 100 milliseconds and display on an intensity image. In Figure 10, different colors represent different polarities of the corner events; red and blue represent negative and positive respectively. Similar to [26], we adopt True Positive Rate (TPR) and False Positive Rate (FPR), a standard framework of binary classification, to evaluate the performance of the different extractors. A corner event is marked as True Positive (TP) if it is within 3.5 pixels of the KLT-based tracking, or False Positive (FP) if it is within the interval of [3.5, 5] pixels. Conversely, an event which is not considered as a corner but is within 3.5 pixels of the KLT-based tracking is marked as False Negative (FN), or True Negative (TN) if it is within the interval of [3.5, 5] pixels. TPR is calculated as TPR = TP/(TP + FN), and FPR is calculated as FPR = FP/(FP + TN). The TPR and FPR of different corner event extractors for different scenes are shown in Tables 1 and 2, respectively. Table 3 reports the Corner Event Rate (CER) of different extractors in four different scenes. CER is defined as the total number of corner events versus the total number of events.
The statistical results show that in a simple scene (shapes) all five methods have higher CER, TPR and FPR scores than that in complex scenes (dynamic, poster and boxes). It is because that there are more events in complex scenes than that in simple scene, and numerous events are considered as non-corner events in complex scenes. It can be seen from Table 3 that Arc* has higher CER scores than other extractors, that is to say, Arc* considers more events as corners. Additionally, Arc* has lower TPR and higher FPR scores, so it can be inferred that Arc* slightly enhances the CER at great sacrifice of accuracy; therefore, the corner events extracted by Arc* contain more incorrect detections. eFAST has the lowest CER scores because it only considers the condition in Figure 5a and ignores the condition in Figure 5b; therefore, many corners are not detected by eFAST. Owing to the coarse-to-fine corner extraction strategy, the CER scores of our method are comparable to the FA-Harris method but are slightly lower than the eHarris method; note that our method achieves results about 20× faster than eHarris. As shown in Tables 1 and 2 that the TPR scores of our method are comparable to eHarris and FA-Harris, and they are obviously better than the other two algorithms. Moreover, our method and the FA-Harris method have lower FPR scores than the other three extractors. That is to say, our method and the FA-Harris method have better precision performance then the above five extractors, which is largely due to the coarse-to-fine corner extraction strategy. Because we develop a box filter-based method to replace the tedious eHarris-based method in fine corner extraction process, our method achieves results about 3× faster than the FA-Harris method. Compared with FA-Harris, our method significantly decreases the calculating amount and improves the efficiency without reducing the accuracy. Illustrative examples of corner event extraction by our method are shown in Figure  10. For each scene, we synthesize all corner events within 100 milliseconds and display on an intensity image. In Figure 10, different colors represent different polarities of the corner events; red and blue represent negative and positive respectively. Similar to [26], we adopt True Positive Rate (TPR) and False Positive Rate (FPR), a standard framework of binary classification, to evaluate the performance of the different extractors. A corner event is marked as True Positive (TP) if it is within 3.5 pixels of the KLT-based tracking, or False Positive (FP) if it is within the interval of [3.5, 5] pixels. Conversely, an event which is not considered as a corner but is within 3.5 pixels of the KLT-based tracking is marked as False Negative (FN), or True Negative (TN) if it is within the interval of [3.5, 5] pixels. TPR is calculated as TPR = TP/(TP + FN), and FPR is calculated as FPR = FP/(FP + TN). The TPR and FPR of different corner event extractors for different scenes are shown in Tables 1 and 2, respectively. Table 3 reports the Corner Event Rate (CER) of different extractors in four different scenes. CER is defined as the total number of corner events versus the total number of events.

Corner Tracking Performance
Illustrative examples of corner event tracking by our method are shown in Figure 11. Similar to [26], we adopt Mean Absolute Error (MAE), Valid Track Rate (VTR) and Mean Track Lifetime (MTL) to evaluate the performance and summarize the results in Table 4. MAE is defined as the mean absolute distance between event-based tracking and KLT-based tracking. We consider event-based tracking as valid tracking if the MAE is within 5 pixels, otherwise we consider it as invalid tracking. For each scene, VTR is defined as the total number of valid tracking versus the total number of corner event tracking. Our tracking algorithm associates a new arriving corner event with the latest active corner that satisfies the velocity direction constraint in its neighborhood, so it is based on the assumption that the corner events are continuously and steadily detected. If the assumption is broken, the new arriving corner event will be considered as a new corner or even generate an incorrect data association. MTL is defined as the mean duration for the corner event tracking which matches the same intensity-based tracking validly. As shown in Table 4, in four different scenes, our method achieves less MAE and higher VTR than the algorithm in [26]. This is because our method adds a velocity direction constraint on Alzugaray's method, which can effectively eliminate incorrect data associations. Owing to this strict constraint, the MTL of our method is slightly less than that in [26]. Note that our method effectively enhances the tracking accuracy at slight sacrifice of MTL. ing which matches the same intensity-based tracking validly. As shown in Table 4, in different scenes, our method achieves less MAE and higher VTR than the algorithm [26]. This is because our method adds a velocity direction constraint on Alzugar method, which can effectively eliminate incorrect data associations. Owing to this s constraint, the MTL of our method is slightly less than that in [26]. Note that our met effectively enhances the tracking accuracy at slight sacrifice of MTL.

Computational Performance
All the above algorithms are implemented by single threaded C++ programs and run on an Intel(R) Core(TM) i7 CPU with 2.80 GHz and 16 GB of RAM. Table 5 presents the average time consumption of a single event and the maximum processing rate in millions of events per second (Mev/s) for each algorithm. Intuitively, our method has a good performance in computational consumption. On average, our proposed extractor achieves results about 20× faster than the eHarris method, but about 1.5× slower than Arc*. This improvement is because we propose a coarse-to-fine corner extraction method; the coarse extraction with high calculation efficiency is used to extract corner candidates, and the fine extraction only processes corner candidates, not all events. Although our extractor is more computationally expensive than Arc*, the accuracy performance is significantly better. Our proposed extractor also achieves results about 3× faster than the FA-Harris method, because we develop a box filter-based method to replace the tedious eHarrisbased method in the fine corner extraction process. Compared with FA-Harris, our method significantly decreases the calculating amount and improves the efficiency without reducing the accuracy. Table 5 also reports the average time consumption of corner event tracking. Superficially, the proposed tracking algorithm takes much more time than extraction; note that only corner events, not all events, are concerned in the tracking process, in fact the time consumption of the tracking process accounts for less than 45% of the total computation time. Furthermore, a single corner tracking time of our method is slower than that in [26], but because of the fine extraction process, our algorithm extracts relatively less but more accurate corner events than Arc*, which also saves a lot of computing time in the whole tracking process. Therefore, our work has great potential in real-time computer vision applications.

Conclusions
We propose an asynchronous real-time corner extraction and tracking algorithm for event cameras and show its excellent accuracy performance and good computational efficiency. In our algorithm, corner events are asynchronously extracted by a coarseto-fine extractor and associated with the latest active corners that satisfy the velocity direction constraints in their neighborhood. Compared with the method in [26], our method effectively enhances the accuracy at slight sacrifice of computational efficiency. Experiment results also indicate that the proposed method can process more than 4.5 million events per second, showing great potential in real-time computer vision applications. Our further interest lies in applying our work to SLAM in challenging environments (high velocity movements or high dynamic range scenes) while existing frame-based SLAM algorithms have limitations.  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: http://rpg.ifi.uzh.ch/davis_data.html.