You are currently on the new version of our website. Access the old version .
J. ImagingJournal of Imaging
  • Article
  • Open Access

26 February 2021

Fall Detection System-Based Posture-Recognition for Indoor Environments

,
and
1
LRIT, Raba IT Center, Faculty of Sciences, Mohammed V University in Rabat, Rabat B.P. 1014, Morocco
2
ADMIR LAB, IRDA, Rabat IT Center, ENSIAS, Mohammed V University in Rabat, Rabat B.P. 1014, Morocco
*
Author to whom correspondence should be addressed.

Abstract

The majority of the senior population lives alone at home. Falls can cause serious injuries, such as fractures or head injuries. These injuries can be an obstacle for a person to move around and normally practice his daily activities. Some of these injuries can lead to a risk of death if not handled urgently. In this paper, we propose a fall detection system for elderly people based on their postures. The postures are recognized from the human silhouette which is an advantage to preserve the privacy of the elderly. The effectiveness of our approach is demonstrated on two well-known datasets for human posture classification and three public datasets for fall detection, using a Support-Vector Machine (SVM) classifier. The experimental results show that our method can not only achieves a high fall detection rate but also a low false detection.

1. Introduction

The study in [1] shows that millions of elderly people—those aged 65 and over—fall each year. One in four elderly people falls, but less than half talk to their doctor. Successive falls usually come because of the first fall. Several factors cause falls, such as lower-body weakness, vision problems, difficulty walking and balancing, and the use of medicines. In order to live in a well-secured environment, researchers have proposed numerous systems to reduce the risk of falls for these elderly people.
Recently, many studies, approaches and applications were proposed for fall detection [2,3,4,5]. A recent one was proposed in [6], where the authors discuss the taxonomy of fall detection and the availability of fall data. According to sensor technology developed up to date, each proposed system is classified into two categrories: wearable-based method and camera-based method.
Fall detection systems based on worn sensors are the most commercialized systems that are basically electronic devices that need to be worn by the elderly, or put into his pocket or his clothes. They have many features that can be used to detect a fall. Such sensors are worn on garments with strain sensors to recognize upper-body posture [7]. Using the trixial-accelerometer mounted on the waist of the body in order to classify human movement status [8]. The manual help button is also used for sending for help after falling. However, even if they have many advantages, these sensors need to change their source of power periodically. On the other hand, they require to wear by humans during their Activities of Daily Life (ADL). The help button is useless if the person is unconscious after falling. For this purpose, it can be a source of inconvenience for them.
By comparing computer vision with other methods, it is found that computer vision presents a new solution to overcome the drawbacks of the previous methods. The main advantage is that there is no interaction with an elderly person, indeed, he does not need to wear anything. Moreover, it is able to give information for tracking and identifying several types of his activities as falls. With this method, we could also send an alarm signal with a short video clip as further confirmation of whether an elderly person has fallen. In the real-world environment, the fall detection is very challenging due to pose changes, light changing, shadows, illumination, occlusions, etc. To overcome these uncontrolled conditions, several fall detection approaches are proposed.
The remainder of this paper is organized as follows: in Section 2, the related works of computer vision for fall detection are presented. Section 3 details our proposed approach in which the human posture recognition combined with floor information for fall detection. Then, the experimental results and discussion of the proposed scheme are presented in Section 4. Finally, we give a general conclusion and we discuss some possible future works in Section 5.

3. Our Approach

This section presents the proposed approach. Figure 1 presents the whole fall detection system component. Our approach is composed of three phases. The first phase is for extracting the human silhouette from frames of the input video. The second phase is for extracting local and global features through the human silhouette. These features are combined and used for posture classification. By using our classifier, we distinguish between normal posture and abnormal posture. As for post-processing, we set four rules to validate if the current activity could be classified as fall or normal activity.
Figure 1. Main component of the fall detection approach.

3.1. Background Subtraction

The monitoring systems based on computer vision consist of moving and static detection, video tracking to understand the events that occur in the scene. In our case, we are interested in detecting and extracting a moving person from the background, which is the most challenging task in fall detection systems-based computer vision. According to the literature, the common way to discriminate moving objects from the background is by using Background Subtraction (BS). Currently, many algorithms based BS are proposed, these include Gaussian Mixture Model (GMM) [32], Approximated Median Filter (AMF) [33] and CodeBook model (CB) [34].
For detecting a moving person in a video sequence, the algorithms should take into account some difficulties. The shape’s size of the human body is changing when the camera is far or is close to the human. The color and texture could be affected by shadows or when the living room is ambient light. For this reason, we will consider the Codebook method [34] for its advantage and robustness (generally) to detect and remove the shadows. The comparison results of these three algorithms could be found in [18].
Initially, our task consists to detect one moving object, the elderly, in the video sequence. The camera used is RGB and is installed two meters high in the living room. The background generally is static but if there is some moving furniture, it should be taken into account.
The CB algorithm is a pixel-based approach and it is composed of two phases: training phase (background model) where the algorithm constructs the codebook for each pixel of first N frames in a video; then, this codebook is used for background subtraction purpose.
Generally, the result obtained by the CB algorithm is not perfect and it needs to be improved in order to get the human’s silhouette more accurately. In Figure 2, we show an example of two problems; first, many noisy pixels region which can be produced by shadows. Second, the movement of furniture can be detected as a moving object, then, the other pixels region is taken as a foreground object. These two problems will extremely deteriorate the result of body extraction. As a solution to this, we propose to add a post-processing step.
Figure 2. Codebook background subtraction result.
In the beginning of the post-processing step, we are interested in detecting and removing the shadows from the foreground based on the method proposed in [35]. We estimate the shadows using gradient information and HSV color, then we classify the shadow pixels based on pre-defined thresholds. For each new frame, we apply this step to remove the shadows without updating the background model because the shadow is an active object.
To detect shadow pixels, we first compare the HSV colors of the current frame F with their HSV color in reference frame R. HSV color space was used because of its advantages to separate chromaticity and intensity. Then, the gradient information is used to refine these shadow pixels.
The pre-processing operation step was used where we normalized the V component in HSV-color space to improve the contrast between shadow and non-shadow regions. Let I = ( H , S , V ) the original image where H , S , and V are its channel components. The V channel is normalized as follow:
V n = V m i n ( V ) m a x ( V ) m i n ( V ) ,
where V n is the normalized V component and V is the original component of the image I. As a result, the normalized original image I obtained is as follow I n = ( H , S , V n ) .
To classify pixels to non-shadow/shadow pixel, we based on these rules as follow:
α F p V R p V β , a n d F p S R p S τ S , a n d F p H R p H τ H
where F p H , F p S , F p V and R p H , R p S , R p V are correspond to values of pixel p ( i , j ) in current Frame F and Reference frame R respectively; β , τ S and τ H refer to thresholds used to detect shadow pixels. The range and optimized values of these thresholds are presented in Table 1.
Table 1. Description and values of parameters threshold.
After detecting all shadow pixels using HSV color, we extract all connected component Blobs (B) consisted by these shadow pixels. The gradient magnitude p and gradient direction θ p are computed at each shadow pixels p ( i , j ) from (B) blob. Then, we only extract the significant gradient pixels that are higher than gradient magnitude threshold ( τ m ) ( p τ m ). For gradient direction of shadow pixels, we calculate the difference between them in current frame F and Reference frame R as presented below:
Δ θ p = a r c c o s x F x R + y F y R ( x F 2 + y F 2 ) ( x R 2 + y R 2 ) 1 / 2 .
Then, we estimate the gradient direction correlation between current frame F and reference R at pixel p ( i , j ) in blob B by:
C B = 1 N p = 1 N H ( τ a Δ θ p ) ,
where N is the number of pixels in blob B, H ( . ) is a function which returns 1 if τ a Δ θ p is positive or 0 otherwise; τ a is gradient direction threshold; C B is the average gradient direction of pixels which are similar in frame F and reference R. All pixels in blob B are detected as shadow if the condition C B τ c is satisfied, with τ c is the correlation threshold. The values of thresholds τ m , τ a , τ c are presented in Table 1. The shadow detection result of the proposed method is presented in Figure 3.
Figure 3. Illustration of Shadow detection. (a) The reference frame. (b) The current frame. (c) Shows the detected shadow pixels in current frame. (d) the result of shadow detection showed with blue color in current frame.
The foreground extracted by the CB method and shadow detection contains many blobs which form big and small objects. In order to identify the human silhouette from these blobs, we use two rules: (i) By using the OpenCV blob library [36], we remove all blobs which have a small area (i.e., <50 pixels), (ii) for big blobs, we classify them into many classes by using the rectangle distance, each class contains the nearest blobs using the Equation (5) (the minimal distance is defined empirically as 50 pixels).
D i s t a n c e ( B 1 , B 2 ) = m i n P 1 R 1 , P 2 R 2 d i s t a n c e ( P 1 , P 2 ) ,
where B 1 , B 2 correspond to blob 1 and blob 2; P 1 and P 2 are the closest point from rectangles R 1 and R 2 respectively. If the distance computed between two blobs is less than 50 pixels, then, they are in the same class, otherwise, each one is in a different class.
Then, we determine the blob which corresponds to a human silhouette by using the motion of the blob’s pixels based on the optical flow result and the distance between the current and the previous positions of each class. Thus, the class which has a small distance and high motion is considered as the required class corresponding to the desired human silhouette. In Figure 4, we illustrate the human silhouette extraction from the background using our method where the last frame shows the required result.
Figure 4. Human extraction result. First frame correspond to the Codebook result, the second one is the shadow detection. with simple subtraction between CB result and shadow detection, we get the third frame. After applying the second step from post processing, we get the result in the fourth frame. The last frame illustrates the tracked object correspond to the human silhouette.

3.2. Human Posture Representation

From literature, many approaches for human posture recognition have been proposed [17,18,19,37,38,39]. Generally, these methods can be divided into wearable sensors-based methods and computer vision-based methods. For the first category, the person needs to wear on his body some sensors or some kind of cloths that provides several features used to identify the person’s posture. Such kind of sensors is wearing a garment with strain sensors for recognizing the upper body posture [7], using trixial-accelerometer mounted on the waist of a person’s body in order to distinguish between human movement status [8]. Nevertheless, even if they have several advantages, these sensors need to be recharge or to change their power source periodically and it required to be carried by the elderly during his Activity of Daily Life (ADL), which conclude that these issues can be an inconvenience for them.
The second category of approaches consists of capturing images of the human body. Based on the image-processing techniques, variant features are extracted from the human shape and used for posture classification. In the literature, the central issue in shape analysis is to describe effectively the shape where its characteristics are a fundamental problem. In general, we can divide shape description techniques into two categories. The first one is contour-based methods [24,40,41,42] which analyze only the boundary information of the human body and using the matching techniques to discriminate between different shapes. However, the inconvenience of this method is that the interior information of the shape is ignored which can be resolved by the region-based methods such as in [18,43,44]. They take into account all information of the shape and analyze the interior contents. Such techniques are based on a projection histogram of the shape. In [18,43], the authors extract the histogram of the human shape using the centroid shape-context based on the Log-polar transform. Another technique uses Ellipse projection histogram as local features to describe the human shape [19].
Inspired by previous techniques, we propose a novel histogram projecting method to describe more correctly human shape in order to identify the human posture. The proposed projection histogram is based on the bounding box, where we divide our human shape into different partitions horizontally and vertically using several angles. The intersection of these partitions provides our projection histogram and it is considered as a shape descriptor with a powerful discriminative ability.
After extracting the human silhouette from the background, we based on bounding box fitted on the human shape to extract our projection histogram as a human shape descriptor. Here, the reason for using a bounding box instead of an ellipse is that the ellipse does not take into account all the pixels of the shape. The comparison between them is presented in [45]. Depending on the rectangle center, its height and its width, we divide the human shape horizontally and vertically, as shown in Figure 5. We present below the whole Algorithm 1 which shows the different steps for computing the horizontal and vertical partition to extract the projection histogram as a descriptor of human posture.
Algorithm 1: Our proposed projection histogram algorithm
1:
Input: Number of partition N; Binary shape S; Height, Width of B B ; Reference Horizontal Point ( P h ); Reference Vertical Point ( P v )  
2:
Output: ψ ( i , l ) N N                                    ▹ Projection histogram
3:
1: Initialization:  
4:
       Number of partition N  
5:
        ψ i , l 0 ; i , l = 1 , . . . , N ;  
6:
        P v ( x v , y v ) ;  
7:
        P h ( x h , y h ) ;  
8:
2: Compute θ H , θ V :  
9:
     D H = d i s t a n c e ( P h , B B ) ;  
10:
     D V = d i s t a n c e ( P v , B B ) ;  
11:
     θ H = 2 arctan ( H e i g h t D H 2 ) ;                              ▹ Horizontal angle
12:
     θ V = 2 arctan ( W i d t h D V 2 ) ;                                ▹ Vertical angle
13:
     Δ θ H = θ H N ;                                   ▹ Horizontal partition step
14:
     Δ θ V = θ V N ;                                     ▹ Vertical partition step
15:
3: Let A { ( x t , y t ) , t = 1 , . . . , M } be a Set of pixels in shape S and M is the total points.  
16:
4:Loop for all points  
17:
For t = 1 to M Do  
18:
  if x h x t 0 then                           ▹ *Find pixel id of Horizontal partition
19:
        ( θ H ) t = arctan ( x h x t y t y h ) ;  
20:
        i = R o u n d θ H t θ H 2 Δ θ H ;  
21:
  else 
22:
        ( θ H ) t = arctan ( x t x h y t y h ) ;  
23:
        i = R o u n d θ H t + θ H 2 Δ θ H ;  
24:
  endif  
25:
  if y v y t 0 then                            ▹ *Find pixel id of vertical partition
26:
        ( θ V ) t = arctan ( y v y t x t x v ) ;  
27:
        l = R o u n d ( θ V ) t θ V 2 Δ θ V ;  
28:
  else 
29:
        ( θ V ) t = arctan ( y v y t x t x v ) ;  
30:
        l = R o u n d ( θ V ) t + θ V 2 Δ θ V ;  
31:
  endif  
32:
  ψ i , l ψ i , l + 1 ;  
33:
endFor 
34:
Return ψ i , l ;
Figure 5. Illustration of the shape representation. First and second frames represent the original and human silhouette respectively. Third and fourth frame illustrate horizontal and vertical area partitions and corresponding histogram chart respectively.
The output of Algorithm 1 is ψ ( i , l ) N N . The histogram is followed by normalisation using Equation (6). The goal of normalization is to make sure that the extracted histogram is invariant according to the human size and the distance from the camera.
ψ ^ ( i , l ) N N = 1 M i N j N ψ ( i , l ) .
For experimental study, the number of partitions will be fixed to N = 10. With consideration of C B B ( x c , y c ) as a center of bounding box and H as the height of the image, the Reference vertical point ( P h ) and Reference horizontal point ( P v ) are fixed to ( x h = x c , y h = 0 ) and ( x v = H , y v = y c ) .
By considering only local features to describe all human postures is not sufficient, where some similar postures are very difficult to differentiate between them, which leads the classifier to be confused. For this purpose, we add global features such as the horizontal and vertical angles ( θ H , θ V ) and the ratio between them ( θ H θ V ). Then, we combine local feature and global feature as a whole vector feature for classification. The classification results of these two kinds of features will be experimentally discussed in Section 4.2.3.
The final dimension of the feature vector is N N + 3, where N N represents the local feature vector and 3 represents the three global features. The whole feature vector of the shape ( F S ) is defined as follow:
F S = [ ψ ^ ( i , l ) N N , θ V , θ H , θ H θ V ] .

3.3. Posture Classification

The classifier can be used to attribute our vector feature F S (See Equation (7)) to one of the four posture categories (Lying Siting, Standing, and Bending) which are considered as fall-related posture. The classifier is employed to find a function which maps each feature vector into corresponding label space Y k = { 1 , 2 , 3 , 4 } , where 1, 2, 3 and 4 represent bend, lie, sit and stand posture respectively. As we have several labels, we adopt a multiclass classifier for posture classification instead of a binary classifier. The whole operation is presented in Figure 6.
Figure 6. Posture classification.

3.4. Fall Detection Rules

After the posture classification step, we check the existence of the fall when the output of the classifier is abnormal posture (lay or bend). For this purpose, we use the Algorithm 2 which is composed of four rules. In the first rule, we check if the posture is classified as “lie” or “bend”. Then, we verify if the posture is inside the floor which means if the percentage area of the human silhouette is high than a defined threshold. The value of the threshold is set to 85% and it is founded experimentally. For the posture transition, as most fall activities begin with stand posture and end to lay posture or begin with sit posture and end to lay posture. The time passing for a fall is an average of 20 frames, which is defined based on our experiments. Finally, if these above conditions are kept at a certain time with no motion, which exceeds the defined threshold (we use 25 frames), we confirm the fall detection and an alarm signal for help is triggered.
Floor information is crucial for fall confirmation because the fall activity always ends in lying posture on the floor. Many previous works incorporated the information of ground plane and showed good results [19,46,47].
By the reason of using only RGB camera for fall detection instead of using depth-cameras, and the datasets have been realized with different floor textures. We proposed to use manual segmentation to extract ground regions instead of using any supervised methods such as presented in [48] in order to evaluate our fall detection algorithm. We use only the first frame from any sequence video to extract the floor pixels.
To distinguish between similar activities as falling and lying, we based on time transition from stand to lay postures as shown in Figure 7. From this figure, we can see that the lying activity takes more than 80 frames (e.g., more than 3 s). However, for the fall activity, the number of frames for posture transition is less than 25 frames which means less than 1 second, and this is normal because the fall is an uncontrolled movement and the lying is a controlled movement. For performing this time transition, if the person’s posture is classified as a bend or lay and most of his body region is inside the floor, we count the number of frames between the current frame and the previous frame where the person is classified as a standing or sitting posture. If the number of frames is less than 25 frames, then, we return true as this activity may be considered as fall if the person stays inactivity for a while.
Algorithm 2: Fall detection strategy
1:
Input: Human Posture, Body area, inactivity time threshold T  
2:
Output: Fall or no-Fall  
3:
Repeat:  
4:
       CheckAbnormalPosture()  
5:
       CheckInsideFloor(Area)  
6:
       TransitionPosture()  
7:
       CheckInactivityTime (T)  
8:
       if the conditions 4, 5, 6 and 7 are True then return Fall  
9:
       Else go to step 3
Figure 7. Illustration of posture transition frames of fall activity and normal lying activity. Frames (ae) present a sequence of the lying activity. The lying activity starts from frame number Fr404 (a) and ends to frame number Fr476 (e). Frames (fj) present sequence of the fall activity. The fall activity starts from frame number Fr263 (f) and endqws to frame number Fr283 (j).

4. Experiments Results and Discussion

This section shows the performance of the proposed fall detection system. The architecture has been implemented using the C++ language, Microsoft Visual Studio Express 2012 and OpenCV library 2.4.13 for Background subtraction, Feature extraction, and classification step. The experiments (training, testing) were carried out on a Notebook with Intel(R) Core(TM) i7-6700HQ CPU and 2.60 Hz and 12.00 GB of RAM. With intensively, we conduct on a different datasets where we split them into two categories; the posture recognition dataset and fall activities dataset. More details are presented in the next section.

4.1. Background Subtraction

In this part, we show the performance of the background subtraction algorithm. Some significant results are illustrated in Figure 8. The biggest drawbacks of any method based on background subtraction are lighting change and shadows. For this purpose, we have separately processed the shadow detection and background subtraction. By using the result of shadow detection, we subtract it from the CB algorithm result, followed by morphological operation for removing small blobs. The last column in Figure 8 represents our human silhouette extraction result.
Figure 8. Backgroun subtraction results. (a) original frame, (b) codebook algorithm result, (c) shadow detection, (d) difference between image (b) and image (c). (e) The human silhouette detection after applying simple morphological operation.
As we can see from Figure 8, the condition of light is different in all original images (a). The person can walk near an object as shown in the last row and the person can do his daily activity life with different postures. Under this conditions, the human silhouette is correctly extracted from the background.

4.2. Posture Classification

We perform our posture classification based on the library LIBSVM using SVM classifiers [49]. We use C-SVM multiclass classifier with a kernel of Radial Basis Function (RBF). The default parameter was used except gamma (g) of RBF and the cost (c) of C-SVM which were modified as 0.01 and 100 respectively.
The performance of our method is conducted on datasets described in the next section where four main experiments were made including features evaluation, number of partitions evaluation, comparison of features extraction methods from state of the art and classifiers comparison.

4.2.1. Postures Datasets

To evaluate our posture recognition method and to compare it with other methods, we use two posture datasests which are composed of 2D human silhouettes, including lying, standing, siting and bending posture.
Dataset (D1) [17]: This dataset was released using a single RGB camera. 10 people were invited to participated as volunteers for the experiments simulating. Each person was asked to simulate postures in different directions so that the constructed classifier should be robust to view angles. The postures of each person i are stored in folder named P i with i = 1 , . . . , 10 . The whole dataset contains 3216 postures including 810 lies, 804 stands, 833 bends and 769 sits. Some keyframes are shown in Figure 9.
Figure 9. Posture Samples from Dataset D1. Column 1, 2, 3 and column 4 correspond to Standing, Siting, lying and Bending postures respectively.
Dataset (D2): based on dataset [10,50], we released our posture dataset using the background subtraction algorithm [34] to extract the human silhouette from the videos. This dataset is composed of 2865 postures includes 444 sits, 1394 lies, 453 bends and 576 stands. Figure 10 shows some keyframes from this dataset.
Figure 10. Posture samples from the Dataset D2. Columns 1–4 correspond to standing, sitting, lying and bending postures, respectively.
In Table 2, we show the description of some characteristics and difficulties challenge of both datasets.
Table 2. Characteristics challenge of each dataset.

4.2.2. Performance Criteria

To evaluate the efficiency of our proposed method, there are several indicators based on True Positive ( T P ), False Positive ( F P ), True Negative ( T N ) and False Negative ( F N ) as shown in Table 3.
Table 3. Confusion matrix of classification result.
In this paper, we use the accuracy (Acc) as single indicator which is the percentage of items classified correctly. It is the most straightforward measure of classifier quality.
A c c = T P + T N T P + F P + T N + F N
The precision indicator is the number of items correctly identified as positive out of total items identified as positive. It gives us information about a classifier’s performance with respect to false positives (how did we caught).
P r e c = T P T P + F N , R e c a l l = T P T P + F P .
Recall is also an other indicator. It is the number of items correctly identified as positive out of total true positives. It gives us information about a classifier’s performance with respect to false negatives( how many did we miss).
The traditional F-measure or balanced F-Score is the harmonic mean of precision and recall:
F - S c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l

4.2.3. Local and Global Features Comparison

We first compare between the global and local features. The common 10-fold cross-validation [51] was used to evaluate the features.
As shown in Table 4, we compare the classification result obtained when we use the local feature or global feature alone and when we use the combination between them. The SVM is applied for classification and from this table, we can see that the combined feature provides a high accuracy than using either feature alone.
Table 4. Results features comparison for both datasets D1 and D2.
Table 5 and Table 6 illustrate the performance metrics for the proposed method applied to datasets D1 and D2 using the combined features. The precision, recall and F-Score are the selected evaluation metrics for posture classification which are computed based on the confusion matrix. From these tables, all postures are well classified with high values for all metrics.
Table 5. Postures classification for dataset D1.
Table 6. Postures classification for dataset D2.

4.2.4. Proposed Projection Histogram Evaluation

Deciding the number of partition which can be efficient for posture classification is difficult as shown in Figure 11.
Figure 11. Illustration of the number of partitions horizontally and vertically of human shape. The first row presents the human shape with 5 partitions. The second row presents 10 partitions.
Therefore, we used several numbers of partitions and we tested each one by using the dataset D1 as shown in Table 7. We used 5 partitions (bins), 10, 15 and 20 partitions. As we can see from this table, the 10 partitions give high accuracy than others for the most of cases. These results can be justified that if the number of partitions is small, we lose a lot of information where we got a large number of instances (pixels) in each bin which could be similar between postures. For a large number of partitions, we got several partitions (bins) with empty instances or with equal instances between different postures, which leads to the difficulties in distinguishing between them. As results, we considered 10 partitions as the default value of our system in all next experiments. Figure 12 shows an example of four projection histograms with 10 partition for stand, sit, lay and bend postures.
Table 7. The number of partition evaluation for features extraction.
Figure 12. Illustration of human body posture and his projection histogram. First row presents the human silhouette postures which are standing, sitting, lying and bending postures. Second row notices the correspond projection histogram which are the intersection between horizontal and vertical partition.

4.2.5. Features Extraction Methods Comparison

To show the performance of our feature extraction method for posture recognition. We compare our method with CNN [31], ellipse descriptor [19] and shape context [52] using Dataset D1 and Dataset D2. Table 8 refers to result using D1 and Table 9 refers to result using D2.
Table 8. Postures classification comparison using Dataset D1.
Table 9. Postures classification accuracy comparison of different features- extraction methods using Dataset D2.
In Table 8, we used each particular individual ( P i ) postures for testing, the other posture are used for training. Our method shows significant advantages compared to other methods where it outperforms ellipse descriptor and shape context methods. For the majority of individuals cases with a higher accuracy. Compared to the CNN method, even the deep learning has a high performance in compared to basically methods, our method gives good results where it outperforms for several cases.
In Table 9, we illustrate the comparison result obtained using the dataset D2 where we choose 70% of data for training and 30% for testing. The whole dataset D2 contains 1865 postures. The results show that our method achieve high accuracy than others methods and low in false detection.
An other comparison is performed as shown in Table 10. We compare the performance achieved by our approach and some existing approaches of features extraction discussed in the state of the arts using the same dataset. From this table, the proposed approach presents favorable results compared to those achieved by the ellipse descriptor. The projection histogram [19] is based only on ellipse which does not take into account all pixels in the human silhouette. For the silhouette area method [53], the variations of the human silhouette area are view-invariant, but it is widely depended on the background update strategy. The bounding box ratio method [54] is based only on global features and it is very easy to implement. The normalized directional histogram [20] is used to derive static and dynamic features based on the ellipse.
Table 10. Comparison of different features extraction methods.

4.2.6. Classifiers Comparison

In this section, we compare the performance of the proposed method with results from some common used machine learning algorithms, including Linear SVM (L-SVM), Random Forest (RF), Decision tree (DT), K-Nearest Neigbor (K-NN), Neural Network (NN), and SVM-RBF. Figure 13 presents the confusion matrix results using Dataset D1. From this figure, we can see that the SVM-RBF classifier gives the highest accuracy compared to others classifiers for all types of postures.
Figure 13. Confusion matrix of posture classification. Figures (af) show the result obtained using classifiers Decision Tree, Random Forest, Linear SVM, Neural Netword, KNN and SVM-RBF respectively.

4.3. Fall Detection

4.3.1. Fall Datasets

For fall detection system evaluation, we use three available datasets released in different locations using different types of cameras. Some keyframes are shown in Figure 14. More details of each dataset are presented below.
Figure 14. Key frames from several datasets. The fall and normal activities. First row, second and third row show the fall activities from Dataset D3, D4 and D5 respectively. The Fourth and the last row show the sitting and walking activity from the Dataset D2.
Dataset (D3) [10]: This dataset was released using single RGB camera in one room. The camera was positioned on the ceiling of the room. Different volunteers participated to simulate several activities as walk, sit, fall, crouching-down, lay and bending. It contains 20 videos with 45 normal activities and 29 falls. Some keyframes are
Dataset (D4) [50]: This dataset was recorded on four different rooms (Office, Coffee room, Home, Lecture room) by using a single RGB camera. The camera is positioned 2 m high from the floor. Several activities has been simulated by different volunteers including walks, sits, falls, etc. The dataset is composed of 249 videos. About 57 of these contained normal activities and 192 contained a fall incident.
Dataset (D5) [22]: This dataset was recorded by eight calibrated RGB cameras. It is composed of 24 scenarios. All activities were simulated by one volunteer. There are simulation of falls and normal daily activities. In each, the person engages in many activities, such as falling, sitting, walking, sitting and lying on the sofa.

4.3.2. Experiment Fall Recognition Results

For fall detection, according to the Algorithm 2 in Section 3.4, the posture classification along with the detection floor information are used to detect falls. In Figure 15, we show some human posture classification results with floor information. The frame (a) shows a person who has fallen on the floor, the ‘lying’ posture is detected and the most body region is inside the ground region, if this posture is a result of an activity that is started by standing or siting posture and the number of frames of this transition is less than 25 frames (fall duration = 11), then the fall is detected if this posture kept inactivity for certain time which exceed a defined threshold (threshold = 1 s). The bend posture is shown in figure (b), where the body region is not completely inside the ground region, which confirms that it is a normal activity. An other case is shown in frame (c), the posture is classified as bend and the posture time transition is less than threshold, while this posture kept immobile for more than threshold, the system has classified it as fall activity. The posture in frame (d) is classified as a sitting posture, so while this posture is considered normal activity, our system continue to operate normally until detecting some abnormal postures. The frame (e) shows a person who is standing/walking on the floor and the system detects it as normal activity.
Figure 15. Illustration of fall, bend, sit, and stand activities results. Columns (ae) represent fall, bending, sitting and standing activities, respectively. Rows 1–4, from top to bottom, represent reference images, human silhouettes, human silhouette with floor intersection result, and the result of our fall detection system respectively.

4.3.3. Fall Detection System

To evaluate our fall detection system, we collect several fall videos from Datasets D3–D5. Some videos from these datasets are not considered because of three reasons in which our method cannot be performed; (i) some videos are very short and there is no additional time for fall confirmation, (ii) the first frame in videos contains the person which confused with our background subtraction algorithm that based on first 50 frames to construct the background model, and also with floor extraction. (iii) The person is behind an object when he falls in which his shape is mostly not visible to the camera.
As shown in Table 11, the total number of fall activities is 159, and 136 for non-fall activities. In order to use posture classification of person for fall detection, the postures from dataset D1 and dataset D2 are combined and used to train the SVM classifier. From this table, we can see that our system can detect 155 as falls out of 159 (97.48%) fall activities, while for non-fall activities, only four are detected as falls out of 136 (2.94%); those errors can be justified by the reason when the person is bending to the floor for a while with no motion which exceeds the thresholds, the system detects it as a fall.
Table 11. Results of fall detection system.
In Table 12, we show the experiments result of the proposed method and other methods in the state of the art. As we can see, our method achieves a high accuracy than other methods. The proposed method achieves less recall value compared to the methods [26,27], while it is higher in precision. That means that our method is great in ability of fall detection, while it still needs to be improved to reduce the misclassification of normal activities. Compared to the method [30], our method is better in reducing the false alarm.
Table 12. Results of fall detection system.
As result, the experiment result obtained are acceptable and it proves the effectiveness of our method for fall recognition.

4.4. Discussion

The proposed approach is using posture recognition for a fall detection system. The proposed system is convenient while the elderly do not need to carry any sensors on their body which are generally affected by background noise in the environment. The proposed descriptor for posture classification provides a high rate of classification compared to other posture recognition methods. With a large dataset, including different human postures captured with different cameras, the classifier can effectively distinguish different types of postures. The procedure is totally automatic and there is no need to set up any thresholds to distinguish between them. It is fast and efficient where we based only on a single frame to extract and perform the posture classification instead of using a set of frames as the video clip-based methods [55,56]. Moreover, we use only traditional classifier instead of using deep-learning methods such as in [17,31] which are high in computational time.
Compared to the state-of-the-art method, in posture classification, our proposed approach has the following characteristics, which are the novelties of our work:
As the main component of fall detection is based on the human posture classification, the human extraction from the background should be robust and efficient. For this purpose, we use an effective background subtraction combined with shadow detection in order to cope with background change such as light and shadow problems. In each frame, the blob operations are used to select the not required objects which are classified as furniture or ghost and put them into the background model. Hence, in the next frame, they will disappear from the foreground.
A new human silhouette descriptor is used for posture classification which is more robust compared to the Ellipse-based projection histogram and CNN method. Fall confirmation required floor information in order to distinguish between a human lying on the sofa and a human falling. Most error detection in the previous systems refers to this problem. So, as a solution, confirmation using the floor information reduces the error detection.
However, some problems still occur in our system such as in [10,18,20]. Our system is designed for monitoring a single person living alone at home, which is not adequate for some special cases. Such cases include the presence of multiple people in the home and when the elderly have a large size pet (near to the camera) or small enough (far from the camera). If there is more than one person at home, it is not necessary for our system to operate, as the other persons can ask for help if the elderly person falls. The system will be automatically turned off and sleep until the elderly turns on manually or based on some techniques for counting the number of people, such as [57,58]. For the case where the elderly are large or small enough, the human silhouette is the only one extracted from the foreground, whether to determine if it is a pet or human silhouette, there are some object classification techniques such as [59,60,61], or using deep learning methods as in [62].
Another problem is the occlusion. The home environment often contains many objects as tables, sofa, chairs, and other objects. These objects sometimes cause occlusion which occurs when the elderly is behind one of them. As result, it deteriorates the performance of the fall detection system. For this purpose, adding more than one camera for monitoring the elderly can be used to make sure that the whole elderly body or most of his body is in front of at least one camera. For fall detection, each camera performs the fall detection and results are combined using some techniques such as majority voting to make a decision. Such strategies of majority voting were used in [24].

5. Conclusions

In this work, we have proposed a fall detection system for elderly people based on posture recognition using a single camera. The process of our new approach is straightforward. Indeed, in our system, we first subtract the background to extract the human silhouette from the frame video using the CodeBook algorithm. A shadow detection was added to improve the foreground detection. Then, we are using our proposed method to extract local and global features from this human silhouette. These features are combined and fed up to the classifier to predict his posture type such as lay, sit, stand and bend posture. The local features are the projection histogram extracted based on the bounding box; the human silhouette is divided horizontally and vertically into equal partitions using an angle step. The intersection of horizontal and vertical partitions provides our local feature. The global features are the horizontal and vertical angles combined with the ratio between them. The evaluation of the proposed features extracted for posture recognition shows a good result compared to the state-of-the-art methods. Furthermore, we evaluate our method using the common classifiers including SVM, NN, KNN, RF and DT. Experimental results showed that the SVM classifier is an interesting classifier. After posture classification, four rules for fall detection are used. The fall is detected if these conditions are satisfied. The experiment results show that our fall detection system obtains better performance with high accuracy in fall recognition and low in false detection. Although these results, we still have some problems as discussed above. The multi-person and occlusions are two drawbacks of our system. As future work, we plan to use our proposed approach using at least two cameras by adding people counting techniques and object recognition.

Author Contributions

Conceptualization, Abderrazak Iazzi; methodology, Iazzi Abderrazak; software, Iazzi Abderrazak; validation, Iazzi Abderrazak, Mohammed Rziza and Rachid Oulad Haj Thami; supervision, Mohammed Rziza and Rachid Oulad Haj Thami. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest

References

  1. Bergen, G.; Mark, R.; Stevens, E.R.B. Falls and Fall Injuries Among Adults Aged more than 65 Years — United States, 2014. Morb. Mortal. Wkly. Rep. (MMWR) 2016, 65, 993–998. [Google Scholar] [CrossRef]
  2. Igual, R.; Medrano, C.; Plaza, I. Challenges, issues and trends in fall detection systems. BioMed. Eng. Online 2013, 12, 66. [Google Scholar] [CrossRef]
  3. Yang, X.; Ren, X.; Chen, M.; Wang, L.; Ding, Y. Human Posture Recognition in Intelligent Healthcare. J. Phys. Conf. Ser. 2020, 1437, 012014. [Google Scholar] [CrossRef]
  4. Zhang, Z.; Conly, C.; Athitsos, V. A survey on vision-based fall detection. In Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA, 1–3 July 2015; p. 46. [Google Scholar]
  5. Khan, S.S.; Hoey, J. Review of Fall Detection Techniques: A Data Availability Perspective. arXiv 2016, arXiv:1605.09351. [Google Scholar] [CrossRef]
  6. Casilari, E.; Lora-Rivera, R.; García-Lagos, F. A study on the application of convolutional neural networks to fall detection evaluated with multiple public datasets. Sensors 2020, 20, 1466. [Google Scholar] [CrossRef]
  7. Mattmann, C.; Amft, O.; Harms, H.; Troster, G.; Clemens, F. Recognizing upper body postures using textile strain sensors. In Proceedings of the 2007 11th IEEE International Symposium on Wearable Computers, Boston, MA, USA, 11–13 October 2007; pp. 29–36. [Google Scholar]
  8. Xia, L.; Chen, C.C.; Aggarwal, J.K. View invariant human action recognition using histograms of 3d joints. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 20–27. [Google Scholar]
  9. Iazzi, A.; Rziza, M.; Oulad Haj Thami, R.; Aboutajdine, D. A New Method for Fall Detection of Elderly Based on Human Shape and Motion Variation. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 12–14 December 2016; pp. 156–167. [Google Scholar]
  10. Chua, J.L.; Chang, Y.C.; Lim, W.K. A simple vision-based fall detection technique for indoor video surveillance. Signal Image Video Process. 2015, 9, 623–633. [Google Scholar] [CrossRef]
  11. Nguyen, V.A.; Le, T.H.; Nguyen, T.T. Single camera based Fall detection using Motion and Human shape Features. In Proceedings of the Seventh Symposium on Information and Communication Technology, Ho Chi Minh, Vietnam, 8–9 December 2016; pp. 339–344. [Google Scholar]
  12. Pramerdorfer, C.; Planinc, R.; Van Loock, M.; Fankhauser, D.; Kampel, M.; Brandstötter, M. Fall Detection Based on Depth-Data in Practice. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 195–208. [Google Scholar]
  13. Hung, D.H.; Saito, H. Fall detection with two cameras based on occupied area. In Proceedings of the 18th Japan-Korea Joint Workshop on Frontier in Computer Vision, Kanagawa, Japan, 2–4 February 2012; pp. 33–39. [Google Scholar]
  14. Hung, D.H.; Saito, H. The estimation of heights and occupied areas of humans from two orthogonal views for fall detection. IEEJ Trans. Electron. Inf. Syst. 2013, 133, 117–127. [Google Scholar] [CrossRef]
  15. Kang, H.G.; Kang, M.; Lee, J.G. Efficient fall detection based on event pattern matching in image streams. In Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Korea, 13–16 February 2017; pp. 51–58. [Google Scholar]
  16. Zerrouki, N.; Harrou, F.; Houacine, A.; Sun, Y. Fall detection using supervised machine learning algorithms. A comparative study. In Proceedings of the 8th International Conference on Modelling, Identification and Control (ICMIC), Algiers, Algeria, 15–17 November 2016; pp. 665–670. [Google Scholar]
  17. Feng, P.; Yu, M.; Naqvi, S.M.; Chambers, J.A. Deep learning for posture analysis in fall detection. In Proceedings of the 19th International Conference on Digital Signal Processing, Hong Kong, China, 20–23 August 2014; pp. 12–17. [Google Scholar]
  18. Yu, M.; Yu, Y.; Rhuma, A.; Naqvi, S.M.; Wang, L.; Chambers, J.A. An online one class support vector machine-based person-specific fall detection system for monitoring an elderly individual in a room environment. IEEE J. Biomed. Health Inform. 2013, 17, 1002–1014. [Google Scholar] [PubMed]
  19. Yu, M.; Rhuma, A.; Naqvi, S.M.; Wang, L.; Chambers, J. A posture recognition-based fall detection system for monitoring an elderly person in a smart home environment. IEEE Trans. Inf. Technol. Biomed. 2012, 16, 1274–1286. [Google Scholar]
  20. Fan, K.; Wang, P.; Hu, Y.; Dou, B. Fall detection via human posture representation and support vector machine. Int. J. Distrib. Sens. Netw. 2017, 13, 1550147717707418. [Google Scholar] [CrossRef]
  21. Manzi, A.; Cavallo, F.; Dario, P. A 3D Human Posture Approach for Activity Recognition Based on Depth Camera. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 432–447. [Google Scholar]
  22. Auvinet, E.; Rougier, C.; Meunier, J.; St-Arnaud, A.; Rousseau, J. Multiple Cameras Fall Dataset; Technic Report; Université de Montréal: Montréal, QC, Canada, 2010; Volume 1350. [Google Scholar]
  23. Matilainen, M.; Barnard, M.; Silvén, O. Unusual activity recognition in noisy environments. In Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009, Bordeaux, France, 28 September–2 October 2009; pp. 389–399. [Google Scholar]
  24. Rougier, C.; Meunier, J.; St-Arnaud, A.; Rousseau, J. Robust video surveillance for fall detection based on human shape deformation. IEEE Trans. Circuits Syst. Video Technol. 2011, 21, 611–622. [Google Scholar] [CrossRef]
  25. Akagündüz, E.; Aslan, M.; Şengür, A.; Wang, H.; İnce, M.C. Silhouette Orientation Volumes for Efficient Fall Detection in Depth Videos. IEEE J. Biomed. Health Inform. 2017, 21, 756–763. [Google Scholar] [CrossRef] [PubMed]
  26. Harrou, F.; Zerrouki, N.; Sun, Y.; Houacine, A. An integrated vision-based approach for efficient human fall detection in a home environment. IEEE Access 2019, 7, 114966–114974. [Google Scholar] [CrossRef]
  27. Gracewell, J.J.; Pavalarajan, S. Fall detection based on posture classification for smart home environment. J. Ambient. Intell. Humaniz. Comput. 2019, 1–8. [Google Scholar] [CrossRef]
  28. Ma, X.; Wang, H.; Xue, B.; Zhou, M.; Ji, B.; Li, Y. Depth-based human fall detection via shape features and improved extreme learning machine. IEEE J. Biomed. Health Inform. 2014, 18, 1915–1922. [Google Scholar] [CrossRef] [PubMed]
  29. Aslan, M.; Sengur, A.; Xiao, Y.; Wang, H.; Ince, M.C.; Ma, X. Shape feature encoding via fisher vector for efficient fall detection in depth-videos. Appl. Soft Comput. 2015, 37, 1023–1028. [Google Scholar] [CrossRef]
  30. Wang, B.H.; Yu, J.; Wang, K.; Bao, X.Y.; Mao, K.M. Fall Detection Based on Dual-Channel Feature Integration. IEEE Access 2020, 8, 103443–103453. [Google Scholar] [CrossRef]
  31. Yu, M.; Gong, L.; Kollias, S. Computer vision based fall detection by a convolutional neural network. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; pp. 416–420. [Google Scholar]
  32. Stauffer, C.; Grimson, W.E.L. Adaptive background mixture models for real-time tracking. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), Fort Collins, CO, USA, 23–25 June 1999; Volume 2. [Google Scholar]
  33. McFarlane, N.J.; Schofield, C.P. Segmentation and tracking of piglets in images. Mach. Vis. Appl. 1995, 8, 187–193. [Google Scholar] [CrossRef]
  34. Kim, K.; Chalidabhongse, T.H.; Harwood, D.; Davis, L. Background modeling and subtraction by codebook construction. In Proceedings of the 2004 International Conference on Image Processing, ICIP ’04, Singapore, 24–27 October 2004; Volume 5, pp. 3061–3064. [Google Scholar]
  35. Gomes, V.; Barcellos, P.; Scharcanski, J. Stochastic shadow detection using a hypergraph partitioning approach. Pattern Recognit. 2017, 63, 30–44. [Google Scholar] [CrossRef]
  36. Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 3, 120–125. [Google Scholar]
  37. Wang, J.; Huang, Z.; Zhang, W.; Patil, A.; Patil, K.; Zhu, T.; Shiroma, E.J.; Schepps, M.A.; Harris, T.B. Wearable sensor based human posture recognition. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; pp. 3432–3438. [Google Scholar] [CrossRef]
  38. Huang, J.; Yu, X.; Wang, Y.; Xiao, X. An integrated wireless wearable sensor system for posture recognition and indoor localization. Sensors 2016, 16, 1825. [Google Scholar] [CrossRef] [PubMed]
  39. Paul, M.; Haque, S.M.; Chakraborty, S. Human detection in surveillance videos and its applications-a review. EURASIP J. Adv. Signal Process. 2013, 2013, 176. [Google Scholar] [CrossRef]
  40. Ling, H.; Jacobs, D.W. Shape classification using the inner-distance. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 286–299. [Google Scholar] [CrossRef] [PubMed]
  41. Abou Nabout, A. Object Shape Recognition Using Wavelet Descriptors. J. Eng. 2013, 2013, 435628. [Google Scholar] [CrossRef]
  42. Tieng, Q.M.; Boles, W. Recognition of 2D object contours using the wavelet transform zero-crossing representation. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 910–916. [Google Scholar] [CrossRef]
  43. Hsieh, J.W.; Hsu, Y.T.; Liao, H.Y.M.; Chen, C.C. Video-based human movement analysis and its application to surveillance systems. IEEE Trans. Multimed. 2008, 10, 372–384. [Google Scholar] [CrossRef]
  44. Wang, B.; Gao, Y. Structure integral transform versus Radon transform: A 2D mathematical tool for invariant shape recognition. IEEE Trans. Image Process. 2016, 25, 5635–5648. [Google Scholar] [CrossRef] [PubMed]
  45. Iazzi, A.; Rziza, M.; Thami, R.O.H. Fall detection based on posture analysis and support vector machine. In Proceedings of the 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 21–24 March 2018; pp. 1–6. [Google Scholar]
  46. Solbach, M.D.; Tsotsos, J.K. Vision-Based Fallen Person Detection for the Elderly. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017. [Google Scholar]
  47. Wang, S.; Zabir, S.; Leibe, B. Lying pose recognition for elderly fall detection. Robot. Sci. Syst. VII 2012, 345–353. [Google Scholar]
  48. Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
  49. Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]
  50. Charfi, I.; Miteran, J.; Dubois, J.; Atri, M.; Tourki, R. Definition and performance evaluation of a robust svm based fall detection solution. In Proceedings of the 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems, Sorrento, Italy, 25–29 November 2012; pp. 218–224. [Google Scholar]
  51. Duin, R.; Pekalska, E. Pattern Recognition: Introduction and Terminology; Delft University of Technology: Delft, The Netherlands, 2016. [Google Scholar]
  52. Belongie, S.; Malik, J.; Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 509–522. [Google Scholar] [CrossRef]
  53. Mirmahboub, B.; Samavi, S.; Karimi, N.; Shirani, S. Automatic monocular system for human fall detection based on variations in silhouette area. IEEE Trans. Biomed. Eng. 2013, 60, 427–436. [Google Scholar] [CrossRef]
  54. Liu, C.L.; Lee, C.H.; Lin, P.M. A fall detection system using k-nearest neighbor classifier. Expert Syst. Appl. 2010, 37, 7174–7181. [Google Scholar] [CrossRef]
  55. Fan, Y.; Wen, G.; Li, D.; Qiu, S.; Levine, M.D. Early event detection based on dynamic images of surveillance videos. J. Vis. Commun. Image Represent. 2018, 51, 70–75. [Google Scholar] [CrossRef]
  56. Yun, Y.; Gu, I.Y.H. Human fall detection in videos via boosting and fusing statistical features of appearance, shape and motion dynamics on Riemannian manifolds with applications to assisted living. Comput. Vis. Image Underst. 2016, 148, 111–122. [Google Scholar] [CrossRef]
  57. Yoshinaga, S.; Shimada, A.; Taniguchi, R.I. Real-time people counting using blob descriptor. Procedia Soc. Behav. Sci. 2010, 2, 143–152. [Google Scholar] [CrossRef][Green Version]
  58. Khan, S.; Vizzari, G.; Bandini, S.; Basalamah, S. Detecting dominant motion flows and people counting in high density crowds. J. WSCG 2014, 22, 21–30. [Google Scholar]
  59. Tang, P.; Wang, X.; Huang, Z.; Bai, X.; Liu, W. Deep patch learning for weakly supervised object classification and discovery. Pattern Recognit. 2017, 71, 446–459. [Google Scholar] [CrossRef]
  60. Wu, Y.; Lim, J.; Yang, M.H. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef]
  61. Lecumberry, F.; Pardo, Á.; Sapiro, G. Simultaneous object classification and segmentation with high-order multiple shape models. IEEE Trans. Image Process. 2010, 19, 625–635. [Google Scholar] [CrossRef] [PubMed]
  62. Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. arXiv 2018, arXiv:1807.05511. [Google Scholar] [CrossRef] [PubMed]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.