Multiple Human Tracking Using Binary Infrared Sensors

To create a context-aware environment, human locations and movement paths must be considered. In this paper, we propose an algorithm that tracks human movement paths using only binary sensed data obtained by infrared (IR) sensors attached to the ceiling of a room. Our algorithm can estimate multiple human movement paths without a priori knowledge of the number of humans in the room. By repeating predictions and estimations of human positions and links from the previous human positions to the estimated ones at each time period, human movement paths can be estimated. Simulation-based evaluation results show that our algorithm can dynamically trace human movement paths.


Introduction
Human tracking technologies have attracted considerable attention over the years. For example, with real-time human tracking technology, air conditioners and lights can be controlled smartly by considering the requirements of each human. In addition, elderly people or children can be monitored for safety reasons [1,2]. Consequently, many human tracking systems and algorithms have been proposed [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. One popular method involves camera-based systems [3][4][5][6][7][8][9][10]. However, they are often not acceptable for monitoring elderly people or other similar applications whose objective is to observe humans without invading their privacy. This is because people may be uncomfortable with monitoring using video cameras, even if the original image data are modified and not used directly to perform human tracking. In addition, the size of video data is large compared to other sensor data; therefore, OPEN ACCESS camera-based systems are not suitable for long-term observations. In other popular methods, IC tags or radio frequency identification devices (RFIDs) [11,12] are used. A person wears a RFID, which is then detected by nearby RFID receivers located on the floor or ceiling of the monitored area. Thus, using data stored in the RFID receivers, the person can be tracked. Although the RFID-based systems can be considered to enhance privacy, they force people to carry the RFID devices.
For these reasons, some human tracking methods use infrared (IR) sensors [13][14][15][16][17][18][19][20]. The IR sensor produces a "1" if it detects a human, and a "0" otherwise [21]. Thus, using the IR sensors, the movement paths of subjects can be estimated without sacrificing their privacy. In addition, IR sensors are low-cost and their installation is relatively simple. However, IR sensors have a deficiency-they react regardless of whether they detect one or multiple persons. Therefore, almost all human tracking methods using IR sensors assume that the number of humans in the room is known beforehand. However, from the perspective of practical application development, the number of humans in a room is often unknown.
We previously proposed an algorithm that can simultaneously estimate both the number of humans and their movement paths [22,23], however, the method still had some drawbacks. Thus, the algorithm is invoked after all sensed data are collected, making it difficult to estimate the human movement in real time. In addition, the estimated human position must be the position of a fired IR sensor or the midpoint of the overlapped detection areas of multiple fired IR sensors; thus, the method lacks flexibility. To address these issues we now propose a novel method that can overcome the aforementioned problems [24]. The method dynamically estimates human positions using the weighted centers of grouped fired IR sensors, instead of simply using the position of a fired IR sensor or the midpoint of the overlapped detection areas of multiple fired IR sensors in the previous methods. Thus, by using the new method, we can estimate multiple human movement paths in a timely manner, and the estimated human positions are not restricted by the locations of the IR sensors, thus improving the human tracking accuracy.
The remainder of this paper is organized as follows: in Section 2, a model of the assumed IR sensor system is described. Next, the details of the proposed algorithm are explained in Section 3. Then, evaluation results are shown in Section 4. Finally, our conclusions and future studies are described in Section 5.

Model of IR Sensor System
In this paper, we consider a system constructed using IR sensors and a PC. The IR sensors are randomly installed on the ceiling of a room, but their locations are known. In addition, each IR sensor has a non-directional and circular detection range, the radius of which is r. Here, r is typically several meters, if we use commercially available IR sensors [21]. We assume r = 2.0 m in this paper. The sensor outputs binary data, i.e., it produces a "1" if it detects one or more humans and a "0" otherwise. Moreover, all the IR sensors are connected to the PC using a wired or wireless communication network. Thus, data obtained from the IR sensors are collected in the PC. The data sampling rate is 6 Hz, and the sensed data are saved on the PC with a time stamp. In addition, the positional coordinates of each sensor is known. Furthermore, the detection ranges of the IR sensors should cover the entire area to detect human movements. Note that some IR sensor arrays have been proposed recently [25], and they can detect human movements, but their detection range is typically several meters. Thus, even if using the IR sensor array, it is difficult to cover a large area. Furthermore, the method proposed in this paper can naturally handle the IR sensor array by treating it as a set of densely deployed IR sensors, if the sensor array produces a set of {0, 1}-data depending on human detection.

Proposed Algorithm
We can trace a human movement path by connecting the positions of the fired sensors step by step, if the position coordinates of each sensor is known. However, this method presents the problem that if some of the IR sensors are close together, they often detect the same event and output "1" data at the same time, which could make it difficult to track the human movements using the simple method mentioned above. This is especially true when there is more than one person in the room. To solve this problem, we propose a heuristic algorithm that can estimate multiple human movement paths using only the binary sensed data.

Notations
First, to explain the procedures of our algorithm, we define the variables and parameters as follows: · |*|: The number of elements in list *. · CC(t): A set of the weighted center coordinates of the clusters at time t. CC(t) = {cci(t)|i = 1, …, Cmax(t)}, where cci(t) = (x, y) and they are referred to as cci(t).x and cci(t).y, respectively. · td.Path: A list of the estimated route coordinates of target human "td.ID." By connecting all elements in this list from the first to the last, the estimated movement path of human "td.ID" is obtained. · td.TTL: A lifetime of target human "td.ID". This is an integer value. The initial value of td.TTL is "1," and its maximum value is TTLMAX. · TDL: A list of target humans who currently exist in the room. · TTLMAX: A constant integer value. The maximum value of TTL (time to live). TTLMAX > 0. · WL(t): A weight set of the sensors at time t. WL(t) ={ws(t)|s = 1, …, S}. · ws(t): A weight of sensor s at time t. ws(t) = ∑ d i |s = 1, … , S. This is used to calculate the weighted center coordinate of a cluster, to which sensor s belongs. The coordinate calculation method will be defined later. · WS1: Window size for the number of sensed raw datasets to which the logical-OR operation is applied. This is a given integer value. WS1 > 0. · WS2: Window size or the number of elements (coordinates) that construct movement vector mvt(td.ID). This is a given integer value. WS2 > 1.

// Location estimation step
Apply logical-OR operation and summation to RDL, and obtain ORD(t) and WL(t); call Clustering(ORD(t)) shown in Figure 3 The main procedure of the proposed algorithm is shown in Figure 1. It mainly consists of two processing steps: the location estimation step and path estimation step. The former is described as a part of the main procedure, whereas the latter step will be shown later. At each time t, the two steps are invoked sequentially, and a link from the current to the next human position is estimated. First, as the location estimation step, the candidates for human positions are listed using a clustering technique. Then, in the path estimation step, the links are determined by selecting reasonable positions among the candidates obtained in the location estimation step. At time t, the position of each human is estimated using the candidate sets, i.e., the sets of the weighted centers of clusters, obtained from t-WS2 + 1 to t. Here, WS2 is relatively small. Thus, our algorithm can trace the human movement paths almost in real time, without knowing the number of humans in the room. Our previous algorithm could not do this [22,23]. In addition, the framework of the proposed algorithm is the same as that of the algorithm introduced in [24], except that both the location estimation and path estimation steps are modified to improve human tracking accuracy. In the following subsections, the detailed procedure of each step will be explained.
Thus, ORD(t) becomes {1,1,0,0,1,1} after applying a logical-OR operation to RDL, and WL(t) becomes {3,5,0,0,6,1} by summing up the corresponding sensed raw data in RDL. Here, in the case of S = 1, we cannot estimate human movements actually. However, we can detect if humans exist in the detection area of the IR sensor. It is often called "proximity" position detection. To simplify the discussion, we treat S = 1 case as a human-trackable case in this paper.

Figure 2.
Example of sensed raw data processing.
Next, using ORD(t) and WL(t), the fired sensors during time period [t, t-WS1 + 1] are grouped using a clustering method. Finally, the weighted centers of the clusters are calculated, and they are used in the path estimation step as the candidates for the next human positions. The clustering algorithm is shown in Figure 3. The algorithm is based on the Ward method [26]; however, the termination condition of our algorithm is different from that of the original method. While the original Ward method tries to merge the nearest pair of clusters until all initial clusters are merged into one cluster, our algorithm stops the cluster merging if Equation (3), i.e., the termination condition, is no longer satisfied. The geometrical distance between the centers of any two clusters in the finally obtained cluster set will be longer than the radius of the detection range of IR sensor r.
Here, we assume that a person exists in each obtained cluster, and his/her position is the weighted center of the cluster. The weighted center coordinate cci(t) = (xi(t), yi(t)) of cluster ci(t) at time t is calculated using Equation (4).
Note that each element can belong to more than one cluster in our new clustering method, contrary to that used in our previous algorithm [24]. Thus, we can manage the case that multiple humans exist in the detection range of the same IR sensor, which cannot be managed by our previous algorithm introduced in [24 is the coordinate of element k , and | | c is the number of elements in cluster c: where "s" is a sensor ID. The weighted centers of all clusters are set to CC(t). They become candidates for human positions at time t. Figure 4 shows an example. In the example, cluster c1(t) consists of sensors s = 1, s = 2, and s = 3. If the position coordinates of the sensors are (10, 10), (9,5), and (14,8), and their weights are w1(t) = 4, w2(t) = 1, and w3(t) = 2, the center coordinate cc1(t) will be (11.0, 8.7), i.e., x1(t) = (4 × 10 + 1 × 9 + 2 × 14)/(4 + 1 + 2) and y1(t) = (4 × 10 + 1 × 5 + 2 × 8)/(4 + 1 + 2). Furthermore, sensor s = 2 belongs to another cluster c3(t). Thus, the coordinate of s = 2 is also used to calculate the weighted center cc3(t) of cluster c3(t). If a cluster contains only one sensor, like cluster c2(t) in Figure 4, the center position coordinate is simply the position of the sensor. After all weighted center coordinates of the clusters are calculated and set to CC(t), the path estimation step will be initiated.

Path Estimation Step
The procedure of this step is described in detail in Figures 5 and 6. In the main procedure, PathEstimation(), if TDL is empty, NewTarget() is called to create new humans. Their starting points  (1) and (2) are simply the weighted center coordinate of CC, respectively. Here, TDL represents humans currently existing in the room. If some humans exist, i.e., TDL is not empty, TrackTarget() in Figure 6 is invoked to obtain the next position coordinate of each human. TrackTarget(), which has been changed from our previous one introduced in [24], contains two sub-processes: prediction and modification. In the prediction sub-process, a predicted coordinate, td.PC, is provided for each human as the next position coordinate at time t (see "Block 2" in Figure 6). First, a movement angle α of each human at time t is calculated using a movement vector MV(td.ID). Here, the movement vector represents the expected direction of the corresponding human, and is calculated using the most recently estimated WS2 number of the position coordinates of the human, i.e., mvt(td.ID). Here, equal values for mcWS2(td.ID) and mc1(td.ID) indicate that the target human did not move. Therefore, the predicted coordinate td.PC is simply treated as mc1(td.ID). Otherwise, the prediction is calculated. Figure 7 shows an example. First, an average of movement distance mdAVE(td.ID) is calculated using mvt(td.ID), defined as: where x'i = mci(td.ID).x and y'i = mci(td.ID).y. Using the movement angle α and mdAVE(td.ID), the predicted coordinate td.PC is calculated (see also procedure for TrackTarget() shown in Figure 6).   Next, as the modification sub-process, all pairs of the weighted coordinates in CC and target humans in TDL are examined, and the best pair is selected in consideration of distance d, angle θ, and time to live (TTL) of "td." Here, "d" is the distance between td.PC and cc. "θ" is the relative angle between "mc1(td.ID) → td.PC" and "mc1(td.ID) → cc." A pair that minimizes "d" and "θ" and maximizes TTL at the same time is finally selected as the best pair.
Actual modification is, then, applied for the best pair found in the prediction sub-process, and the next movement position coordinate is obtained as the center of td.PC and cc of the best pair (see also "Block 3" in Figure 6). The obtained next position is appended to the td.Path of the corresponding target human. The elements representing the selected pair of the weighted center coordinate cc and target td are removed from CC and TDL. The aforementioned best-pair selection process is repeated until no more selected pairs exist. In addition, if the next position of a human can be successfully found, the TTL value of the human is incremented; otherwise, it is decremented. Here, the initial value of TTL is "1," and its maximum value is TTLMAX. The TTL value never exceeds TTLMAX. If the TTL of a human becomes less than 0, the human is removed from the TDL as he/she has left the room. In addition, if an unselected coordinate is contained in CC, NewTarget() is invoked and a human is appended to TDL as he/she enters the room. Due to this mechanism, even if the number of humans is not estimated correctly using the clustering algorithm in the location estimation step shown in Figures 1 and 3, and some humans have accidentally been generated by NewTarget() at this point, they will be removed within a short period. Figure 8 shows an example of the Path Estimation step. Here, TDL = {td1, td2} and CC = {cc1(t), cc2(t), cc3(t)}. First, a pair of td1.pc and cc1(t) is selected. The distance between them is smaller than that between others, and the angle is less than 180° and smaller than that between others. Thus, the center coordinate of td1.PC and cc1(t) is calculated, and it is appended to td1.Path. In addition, td1.TTL is incremented. In the next step, because TDL = {td2} and CC = {cc2(t), cc3(t)}, a pair of td2.PC and cc2(t) is selected, and the process described above is applied to the pair. Our path estimation algorithm shown in Figure 6 can be formulated using Equations (6) and (7). Assume that p is the position of a target human at time t, p = x y . Then, a prediction at time t is given by: where p is a predicted coordinate at time t; in this paper, it is the same as td.PC: p = x y . In addition, B is the average of the movement distance mdAVE(td.ID) of the target human, and u is the x-y component form of the predicted movement distance at time t. u = sin cos . Here, α is the angle of the movement vector MV(td.ID) at time t. Furthermore, the modified position at time t is given by: where z is a measurement of the target human at time t and is the same as the center position coordinate cc(t), which is obtained by the best-pair selection. z = x y . Equation (6) is similar in form to the Kalman filter, which is used in related work [19,20]. However, as compared to the original Kalman filter, in our proposed algorithm, p and z are calculated as mentioned previously, not using the method for obtaining the Kalman gain. Since the system environment is drastically changed by the numbers of IR sensors and humans in a room and the detection range and distribution density of IR sensors, we cannot apply the calculation method to obtain the Kalman gain.

Evaluation Environment
To evaluate our algorithm, we used generated simulation data. The details are as follows. First, the density of IR sensors "D" is defined by: where "S" is the number of IR sensors; "r" is the radius of the IR sensor detection range, r = 2.0 m; and "A" is the entire area of the room monitored, A = 100 m 2 (10 m × 10 m). We assume that the entire area A should be covered with the minimum number of IR sensors. Thus, if we fix the density D as 2.0, 3.0, 4.0, or 5.0, the number of IR sensors S that should be deployed is calculated using Equation (8). The corresponding value of S is actually 16, 24, 32, or 40 as shown in Table 1. Here, "S" sensors should be randomly deployed in the area.  The number of humans "H" is changed from one to four, and the human movement data are created for each case. The human movement scenario is as follows. After a random waiting time from 0 to 30 s, each human should enter an area through a door and walk around randomly. The walking scenarios are as follows. The human decides a destination position and walks toward it using the shortest path. Then, the human either stops at the destination position, or walks to a new random destination. Here, the human does not return quickly. After 60 s, the human should exit through the door. The walking speed of each human is randomly changed from 1.25 m/s to 1.75 m/s. For each "H," 50 different variations of data were generated. In total, 800 (4 different H × 4 different S × 50 different variations) different input data were generated and used for the evaluation of the proposed algorithm. Here, the parameters of our algorithm are WS1 = 4, WS2 = 24, and TTLMAX = 18. According to our preliminary estimations, these combinations of values produce good path estimations.

Evaluation Results and Remarks
Our method estimates the number of humans and their movement paths simultaneously. Thus, it is difficult to evaluate the method using tracking errors like in other human tracking methods, which assume that the number of humans is fixed and known [20,27]. To evaluate our method, we introduced four metrics: success estimation rate, averaged error, averaged tracking rate, and success rate of the number of humans. The details are as follows: Tables 2-5 show the evaluation results. For comparison, we also evaluated the previously proposed algorithm [24]. In the tables, the upper values represent the results of the proposed algorithm and the lower ones represent the results of the previous one.
( 1) Success estimation rate Table 2 represents the percentage of 50 input patterns provided for each different case that were estimated correctly by our algorithm. For H = 1, both algorithms estimated all human movement paths correctly for all different densities. For H = 2, the success estimation rate of the proposed and previous algorithms were 60% and 57% on average, respectively. In addition, for H = 3 and H = 4, both the estimation rates were decreased to less than 34%. If the number of humans increase, their movement paths are often overlapped, which makes it difficult to estimate individual human movement. Thus, the success estimation rate decreases with the increase in the number of humans in a room, regardless of the path estimation methods being compared.  (2) Averaged error Table 3 shows the averaged error of the estimated human movement paths compared to the true human movement paths among the successfully estimated data. In our algorithm, a position coordinate of the IR sensor was basically used. Therefore, an error in the detection range of the IR sensor is contained during the initial stage of the path estimation. Even so, for H = 1 and H = 2, the proposed method can trace the human movement path accurately for less than 0.60 m and 1.42 m on average, respectively. However, although our previous algorithm can trace one human for less than 0.68 m, the averaged error is more than 2.08 m when H = 2. In other cases, it could not trace the human movement paths correctly. (3) Averaged tracking rate In Table 4, the averaged tracking rate for each target human is described. The tracking rate is defined as the ratio of the length of the correctly estimated route to the total length of the actual human movement path. If the rate is high, it indicates that the human path is estimated well for a long time. For H = 1, the human was tracked with an accuracy rate of more than 94% in both algorithms. In other cases, if the density was high, the tracking rate was also high. This implies that many sensors are needed to estimate multiple humans.  (4) Success rate of the number of humans Table 5 shows what percentage of the number of humans correctly estimated in the room at each time. It is directly related to our clustering algorithm. Using the proposed algorithm, when H = 1 and H = 2, the success rates of H were more than 97% and 76%, respectively. However, for H = 3 and H = 4, these rates were from 45% to 60%. If the number of humans increases, the probability that some of them are walking close together increases. Therefore, the performance of the clustering algorithm decreases. Compared to our previous algorithm, the newly proposed clustering algorithm does not perform well. According to these results, if the number of humans in the room increased, the tracking accuracy of our method decreased. This is because our clustering algorithm does not work well if some humans walk closely together. In addition, if relatively many humans exist, e.g., when H = 3 and H = 4 in our experiments, almost all the IR sensors are fired simultaneously. In such cases, no algorithm can estimate the number of humans correctly. However, as shown for H = 2, our algorithm can estimate the human movement paths well with a 1.42 m averaged error. In addition, our human-location-estimation algorithm based on a clustering method performs with a success rate of greater than 76% even for the complicated case when two persons enter and exit the room at different times. Compared to our previous algorithm, the clustering method itself in the proposed algorithm does not perform well, because the new clustering method allows a position candidate to belong to more than one cluster. However, owing to this fact, the averaged accuracy of the final human tracking is improved. As shown in Table 3, the averaged errors of the estimated human movement paths are improved two-fold for H = 2. Figure 9 illustrates an example of the estimated human movement paths for H = 2. The solid line is the true path of the first human, and the dotted line is the true path of the second human. They are indicated by "TruePath1" and "TruePath2" in Figure 9. Two different kinds of markers represent the estimated locations of two persons, respectively. They are "EstPath1" and "EstPath2" in the figure. The start and goal locations of both persons are the same, which is indicated as "Door" in the figure. As shown in Figure 9, our proposed algorithm can trace human movement well. This is the best result among 50 different input data generated with the same condition, D = 5.0. That is, 40 sensors were deployed in the field. The numerical results for the case shown in Figure 9 are as follows: the averaged error of the estimated paths, corresponding to Table 3, is 0.56 m. The averaged tracking error, corresponding to Table 4, is 99.89%. Success detection rate of number of humans, corresponding to Table 5, is 86.85%.

Conclusions
We have proposed an algorithm that can track the human movement paths using only the binary sensing data obtained from infrared sensors attached to the ceiling. The human positions are estimated at each timepoint based on a clustering method. Thus, the proposed algorithm can track multiple humans even if the number of humans in the room is changed dynamically, which was difficult to realize using methods proposed in related studies. According to simulation-based evaluations, our algorithm can trace real human movement paths with a 1.32 m error on average if two humans are in the room. In future studies, we will evaluate our algorithm using real sensed data obtained from a real infrared sensor system.