CNN-Based Fall Detection Strategy with Edge Computing Scheduling in Smart Cities

: The livelihood problem, especially the medical wisdom, has played an important role during the process of the building of smart cities. For the medical wisdom, the fall detection has attracted the considerable attention from the global researchers and medical institutions. It is very difﬁcult for the traditional fall detection strategies to realize the intelligent detection with the following three reasons: (i) the data collection cannot reach the real-time level; (ii) the adopted detection methods cannot satisfy the enough stability; and (iii) the computation overhead of collection device is very high, which causes the barely satisfactory detection effect. Therefore, this paper proposes Convolutional Neural Network (CNN)-based fall detection strategy with edge computing consideration, where the global network view ability of Software-Deﬁned Networking (SDN) is used to collect the generated data from smartphone. Meanwhile, on one hand, the edge computing is exploited to put some computation tasks at the edge server by the scheduling technique. On the other hand, CNN is equipped with both edge server and smartphone, and it is leveraged to train the related data and further give the guidance of fall detection. The experimental results show that the novel fall detection strategy has a more accurate rate, transmission delay, and stability than two cutting-edge strategies.


Introduction
The definition of a smart city was first proposed in the 1990s, referring to the use of Information and Communications Technologies (ICT) and modern infrastructures within cities [1]. Consider the purpose of improving the citizens' living quality, and the smart city is conceptualized as a scenario where the citizens are the center of all applications and services [2]. In this context, many cities, such as Shenzhen (China), Busan (South Korea), Santander (Spain), Chicago (United States), and Milton Keynes (United Kingdom) are trying their best to become smarter by using ICT to optimize various aspects of city operation and management.
As we know, the medical wisdom plays an important role during the process of building smart cities because it belongs to the non-negligible livelihood problem. Although many countries have improved the healthcare systems to increase the average life expectancy, it is very difficult to provide the proper care for the older people because the frequent workplace changes have the great increasing influence on the average distance between family members (children and parents). Furthermore, the falling behavior has been regarded as one of the significant issues faced by

Literature Review
There have been a number of research papers on fall detection. For example, Lgual et al. [13] reviewed 327 related literature, which mainly concentrated on the context-aware systems further including based on cameras, floor sensors, microphones, and pressure sensors. Meanwhile, they emphasized that there was no standardized context-aware technique widely accepted by the research community in this field even though many feature extractions and machine learning techniques were adopted by researchers. Ward et al. [14] made a comprehensive review on fall detection from the perspective of applications. Therein, these technique methods could be divided into based on manually operated devices, body worn automatic alarm systems and devices to detect such changes that might increase the falling risk. In addition, Refs. [15][16][17][18][19][20] also presented the survey contributions to address the similar issues. Different from them, this paper plans to review the latest and representative research achievements since 2017.
In [21], this research made use of a comprehensive bounding box and a dynamic state machine in a new way to do fall detection. The proposed approach offered a way to track and analyze continuous data streams of the visual images to automatically predict a fall event prior to the fall state in a single-phase instead of the typical two-phases. In [22], the authors regarded the fall detection as an example of action detection and proposed to locate its temporal extent, which was achieved by exploiting the effectiveness of deep networks. In the training stage, the trimmed video clips of four phases (standing, falling, fallen, and not moving) in a fall were converted into four categories of so-called dynamic images to train and predict the label of each dynamic image. In the testing stage, a set of sub-videos was generated by using a sliding window on an untrimmed video. In [23], this paper aimed to detect human fall by utilizing the built inertial measurement unit sensors of a smartphone attached to the body with the signals wirelessly transmitted to remote PC for processing, where a threshold based fall detection algorithm was implemented while a supervised machine learning algorithm was used to classify activity daily living. In [24], a new dataset of movement traces acquired through the systematic emulation of a set of predefined activities of daily life and falls was described to provide the reference for research.
In [25], the authors concentrated on energy efficiency of a wearable sensor node and proposed the design of a tiny, lightweight, flexible, and energy efficient wearable device, in which different parameters (e.g., sampling rate, communication bus interface, transmission protocol, and transmission rate) impacting on energy consumption of the wearable device were studied. In [26], a framework by using acoustic local ternary patterns and analyzing environmental sounds was proposed, which suppressed silence zones in sound signals and distinguished overlapping sounds. Specifically, acoustic features were extracted from the separated source components by using the acoustic local ternary patterns, and then fall events were detected through a support vector machine based classifier. In [27], an intelligent system to detect human fall events by using a physics-based myoskeletal simulation was proposed, which demonstrated that the use of fall recordings was unnecessary for modeling the fall since the simulation engine could produce a variety of fall events customized to the individual's physical characteristics. In [28], a fuzzy logic-based fall detection algorithm was developed to process the output signals from the accelerometer and sound sensor, where a valid fall activity detected by the accelerometer, coupled with a detected sound pressure from the resultant fall could infer an occurrence of a valid fall. In [29], a methodology for acquisition and preprocessing of measurement data from infrared depth sensors was proposed. Therein, the data processing was initiated with extraction of the silhouette from the depth image and estimation of the coordinates of the center of that silhouette.
In [30], a hierarchical classifier based on fisher discriminant analysis was developed to improve detection accuracy and reduce false alarms. It divided human activities into three categories: non-fall, backward fall, and forward fall with the hierarchical classifier. In [31], a computer vision based framework was proposed to detect falls from surveillance videos. It introduced a novel three-stream CNN as an event classifier, where silhouettes and their motion history images serve as input to the first two streams, while dynamic images whose temporal duration were equal to motion history images, were used as input to the third stream. In [32], the authors presented an automated analysis algorithm for remote detection of high impact falls, based on a physical model of a fall, aiming at universality and robustness. In [33], a fall detection system based on a 2D CNN inference method and multiple cameras was devised. This approach analyzed images in fixed time windows and extracted features by using an optical flow method that obtained information on the relative motion between two consecutive images. In [34], the authors presented the design of an embedded software for wearable devices connected in wireless mode to a remote monitoring system. In particular, the work proposed the embedding of a recurrent neural network architecture on a micro controller unit. Furthermore, to address the feasibility of such resource-costrained deep learning approach, the work presented a few general formulas to determine memory occupation, computational load and power consumption. In [35], a new fall detection system relying on different signals acquired with multiple wearable sensors was proposed. The system made use of the covariance of the raw signals and the nearest neighbor classifier, at the same time, it also employed the covariance matrix as a straightforward mean for fusing signals from multiple sensors to enhance the classification performance. In [36], combining ensemble stacked autoencoders with one-class classification based on the convex hull, the authors proposed a novel intelligent fall detection method based on accelerometer data from a wrist-worn smart watch. In the proposed method, the first role was adopted for unsupervised feature extraction to overcome the disadvantages of artificial feature extraction while the second role was used for pattern recognition.
In [37], the authors presented a dedicated system for detecting falls caused by complications in hemodialysis patients using RF signals. In particular, they designed a residual feature extraction algorithm based on the hemodialysis patient safety process model, and the fall detection of hemodialysis patient was treated as a machine learning problem where four classification models were built via learning residual feature space. In [38], an innovative highly-efficient intelligent system based on a fog-cloud computing architecture was proposed to timely detect falls using deep learning techniques deployed on resource-constrained fog nodes. In [39], the authors devised a scalable architecture of a system that could monitor thousands of older adults, detect falls, and notify caregivers, in which several machine learning models were employed to evaluate their suitability in the detection process. In [40], the paper proposed a centralized unobtrusive IoT based device-type invariant fall detection and rescue system for monitoring of a large population in real time. It supported that any type of device could be used to monitor a large population in the proposed system. In [41], the authors introduced an effective and optimized fall detection system that used an approach based on a killer heuristics optimized AlexNet convolution neural network, in which the feature searching was performed by applying the alpha-beta pruning move.
According the above reviewed literature, it is observed that the new emerging techniques have been used to help fall detection. In spite of this, the current research achievements always have some limitations which need to be improved and enhanced. For example, the data collection cannot reach the real-time level; the adopted detection methods cannot satisfy the enough stability; and the computation overhead of collection device is very high. Given this, this paper will exploit the emerging techniques (e.g., CNN and edge computing) and networking paradigms (e.g., SDN) to further study fall detection. In other words, SDN, edge computing and CNN are novel elements in this paper which are different from the traditional methods.

System Framework
As the above mentioned, the smartphone has the limited storage resources and computation resources, which cannot support the efficient fall detection. Based on the edge computing model, the some computation tasks can be migrated into the edge server for computation. In addition, the current data collection cannot reach the real-time level, thus SDN is employed to monitor the global data status in terms of the smartphone or the edge computing server via the concentrated control ability. In summary, the edge computing and SDN are adopted to help realize the fast and efficient fall detection, which is the newest idea to the best of our knowledge. In spite of this, it always requires CNN for the data training in order to provide the guidance of fall detection. According to such statements, the system framework of EdCNN is shown in Figure 1. We can see that the proposed EdCNN includes three main external roles, i.e., a number of smartphones, an edge computing server, and two SDN controllers, where both smartphone and edge computing server are integrated with the CNN used to training the corresponding data. The whole functions and workflow of EdCNN are described as follows. The smartphone is used to collect data from the older people with convenience and flexibility, and it is also used to make the data computing via CNN. The edge computing server is used to accommodate much data and make the data computing via CNN. The SDN controller is used to scan the global data status in the smartphone or edge computing server. Regarding the simple workflow, at first, some computation tasks are migrated into the edge computing server via the scheduling strategy (see Section 4 for details), at the same time, the remaining tasks are computed via CNN at the smartphone. Then, the migrated tasks are computed via CNN at the edge computing server. During the performing process, one SDN controller is responsible for monitoring the data status of smartphone and the other one is responsible for scanning the data status of edge computing service, which provide the support for the scheduling strategy through the Internet-enabled smart city. Meanwhile, the SDN controller sends the controller signal to adjust the migration amount of data after both smartphone and edge computing server submit the mirroring data to the SDN controller. At last, the trained data are obtained to conduct the fall detection.
It is obvious that two techniques issues should be addressed in this paper, i.e., how to make the edge computing scheduling (how many tasks are migrated into the edge computing server) and how to train the data (which is used to implement CNN-based data training). Regarding the virtual data submission to the SDN controller, it can be realized by the OpenFlow switch [42,43] that has the inherent minoring and indirection functions. Therefore, we will introduce edge computing scheduling strategy and CNN-based data training in the following sections.

Edge Computing Scheduling
Although the introduction of edge server relieves the computation pressure and storage pressure of smartphone, i.e., decreasing energy consumption, it causes the transmission delay from the smartphone to the edge computing server. It is observed that there is a trade-off problem between running delay and energy consumption. In this section, we first give the scheduling model based on edge computing; then, we present an intelligent method to address the trade-off issue.

Scheduling Modeling
With respect to running delay and energy consumption generated by the introduction of edge computing, they are modeled in this section. In particular, the total cost includes three parts, i.e., local running, tasks transmission, and results returning. Among them, the first part means the necessary cost and the last two parts mean the transmission cost.

Running Delay Quantification
Suppose that there are m smartphones, denoted by sp 1 , sp 2 , · · · , sp m . For the arbitrary sp i (here 1 ≤ i ≤ m), it includes n computation tasks, denoted by task i1 , task i2 , · · · , task in . For the arbitrary task ij (here 1 ≤ j ≤ n), its size is denoted by s ij . In fact, the computation time of task at the smartphone depends on the task size and the handling capacity of CPU. Let pt ij denote the computation time of task ij at sp i , and we have where c i is the handling capacity of sp i and u cpu is the CPU utilization rate of the smartphone.
In this paper, we assume that all smartphones have the same CPU utilization rate. Furthermore, let ut ij and dt ij denote the upstream transmission time and the downstream transmission time, respectively, and we have where is ij is the data size in case of inputting task ij to be transmitted via the upstream link; os ij is the data size in case of returning the corresponding computation results via the downstream link; baw i is the network bandwidth between sp i and edge computing server; here, we assume that upstream link and downstream link have the same network bandwidth. Let t ij denote the total running delay of task ij , and it is defined as follows: On this basis, the total running delay with respect to m smartphones and n tasks is defined as follows: Mathematically, the single objective regarding the total running delay optimization is expressed as follows: Minimize T

Energy Consumption Quantification
Consider the energy consumption generated from the smartphone, and it mainly depends on the power consumption. For the smartphone, the energy consumption modules usually include CPU and memory. Let de ij denote the consumed energy when task ij is performed at sp i , and we have Among them, u mem is the memory utilization rate of smartphone; λ cpu and λ mem are the power consumption coefficients of CPU and memory, respectively. Similarly, we assume that all smartphones have the same memory utilization rate, the same CPU power consumption coefficient, and the same memory power consumption coefficient.
Consider the energy consumption generated from the task transmission, and it is usually positive correlation with the transmitted traffic [44,45]. Let te ij denote the corresponding energy consumption when task ij is transmitted to the edge computing server and the computation results are returned to sp i , and we have where λ link is the energy consumption coefficient in case of data transmission. Let e ij denote the total energy consumption of task ij , and it is defined as follows: On this basis, the total energy consumption with respect to m smartphones and n tasks is defined as follows: Similarly, the single objective regarding the total energy consumption optimization is expressed as follows: Minimize E With the combination of running delay and energy consumption, the purpose of this paper is to minimize both T and E. However, T and E are not the order of magnitude; therefore, they need do standardization operation (such as max-min method [46]). Let T and E be the standardized results of T and E, respectively, and the total cost is defined as follows: where α and β are the weights of delay running and energy consumption respectively. Furthermore, the bi-objective optimization problem is expressed as follows:

Scheduling Methodology
It is obvious that Equation (12) involves an NP-hard problem [47], which can be solved by the intelligent method. In this paper, we use the Bird Swarm Algorithm (BSA) [48,49] to optimize Equation (12), and the finally obtained location of bird population is regarded as the optimal scheduling strategy. Regarded BSA, it is a new bionic intelligence algorithm derived from the research of behaviors of bird swarm and proposed by Meng et al. [48] in 2015. For the convenience, the social behaviors of bird swarm are divided into three types, i.e., foraging behavior, vigilance behavior, and flight behavior. Let N denote the number of birds in the swarm and I max denote the total number of iterations to perform the swarm behaviors, and the location of the arbitrary bird b i (here 1 ≤ i ≤ N) in the D-dimensional space at the I-th (here 1 ≤ I ≤ I max ) iteration is expressed as follows:

Behavior Expression
The foraging of bird usually depends on the historical experience, and the individual's location updating is expressed as follows: Among them, rand(0, 1) is used to generate a random value between 0 and 1; ol best ij is the best location regarding b i in the j-th dimension space; gl j is the best location regarding the whole swarm in the j-th dimension space; ξ is the cognitive accelerated coefficient; γ is the social accelerated coefficient.
During the process of flight, each bird tends to move towards the central position of swarm. In this way, it is inevitable to generate the competition relation among individuals, which hinders the continuous flight. Therefore, the bird needs to own the vigilance behavior. Under such condition, the individual's location updating is expressed as follows: Among them, al j is the average location of swarm in the j-th dimension space; pFit i is the fitness value to which the best historical location of b i corresponds; sumFit is the sum of pFit i for all birds; ε is used to avoid zero-division error; A 1 and A 2 are the generated indirect and direct influences from the natural environment, respectively, when the individual moves towards the central position of swarm. In particular, if pFit i is larger than pFit k , it means that b k faces more serious environment interference than b i .
The individual can be interfered by the other flying animals, thus some birds may fly to other places for the new life. Under such situation, some birds act as the producers and the remaining birds act as scroungers. For the two roles, the corresponding individual's locations are updated as follows: where randn(0, 1) is the Gaussian distribution where the average value is 0 and the standard deviation is 1, and r is the scrounger learning efficiency to follow the producer.

Solution
Although BSA shows good search ability, the global search ability is still limited, especially when handling the complex and large-scale multi-extremum optimization problems. Given this, this paper plans to improve the flight behavior of BSA. In terms of Equation (18), we improve it as follows: It indicates that the location difference follows the Gaussian distribution, which is a random walk. It is obvious that such form satisfies the real flight behavior. Based on the above statements, and the BSA-based edge computing strategy is described as follows.

CNN-Based Data Training for Fall Detection
Before performing the CNN-based data training, the feature extraction of data is very significant because it derives the key information from the smartphone to classify the exact fall detection. In this paper, for each axis, the five features, i.e., minimum, mean, maximum, kurtosis, and skewness are extracted, and the principal component analysis is employed for this. The concrete method can be found in [50].

CNN Structure
The classical CNN structure is shown in Figure 2, which consists of input layer, convolution layer, pooling layer, fully connected layer, and output layer [51]. In addition, it also includes convolution, pooling, and full connection operations. Among them, the convolution layer is responsible for data feature extraction collected from smartphone, and it is composed of a group of convolution kernels, where the weight of convolution kernel is automatically updated by learning the objective function. The pooling layer is performed between convolution layers and its function is to reduce data dimension, which can effectively decrease the number of parameters and avoid the overfitting phenomenon. The fully connected layer is used to do the classification operation which converts the multi-dimensional vector from the convolution layer into the one-dimensional vector. Furthermore, the intermediate actions (i.e., convolution, pooling and full connection) are regarded as nonlinear unit and batch normalization. The nonlinear unit is to do the nonlinear mapping for the results output from the convolution layer, which guarantees the the nonlinear function of the CNN approach to a large extent and improves the model's express ability regarding features. The function of batch normalization is to transform the input data into the standard normal distribution in order to guarantee that the input value of nonlinear units remains in the interval with the relatively large gradient, which avoids the vanishing gradient problem and accelerates the convergence speed of data training.

CNN Computing
The nonlinear unit and batch normalization play the important roles during the process of CNN computing, including activation function, pooling function, classification function, and loss function.

Activation Function
This paper uses ReLU (Rectified Linear Unit), instead of sigmoid function and tanh function, due to the complex exponent arithmetic, as the activation function which owns the simple express but has a good effect of improving the computation efficiency due to the fact that it performs the linear computation. For the arbitrary neuron (CNN has several hidden layers and each hidden layer includes a number of neurons), denoted by neu, let s(neu) denote its activation state, and we define the activation function as follows: where neu is the stimulus value to which the neuron corresponds. Especially when neu ≤ 0, the neuron is not activated. In other words, when the stimulus intensity is larger than the given threshold, the neuron is activated. In this paper, the threshold is set as 0.

Pooling Function
We select Maxout function as the pooling function. As the fall detection needs the data with the relatively strong feature express, the Maxout function is suitable. Suppose that there are Q data features, denoted by x 1 , x 2 , · · · , x Q , and we defined the Maxout function as follows: Furthermore, indicates that the selected maximum is suitable and can be determined; otherwise, we reselect f (x) as follows:

Classification Function
We adopt Softmax function as the classification function (output function). For Q data features and m smarthones, we can obtain a vector V = (V 1 , V 2 , · · · , V m ). For the arbitrary where w ij (here 1 ≤ j ≤ Q) denotes the weight. On this basis, let y j denote the converted value of x j via the classification function, and it is defined as follows: where b j is the offset with respect to V j and y j is the exponential transformation with respect to V j and b j . Letŷ j denote the probability to y j , and we havê For Equation (22), we select the maximalŷ j as the output result.

Loss Function
The loss function is expressed by the difference between real value and prediction value, and it is used to guarantee the training quality of CNN. In particular, the small loss function value means the high accuracy of fall detection. In this paper, we select cross entropy function as the loss function. Suppose that there are K data samples, and the loss function is defined as follows:

Experiment Results
At first, we give two kinds of parameters settings, i.e., algorithm simulation parameters and CNN structure parameters. The former is shown in Table 1, and the latter is shown in Table 2. Then, we introduce the fall detection's dataset which includes 1000 older adults from four different scenarios (i.e., home, square, shopping mall, and street), where there are 600 males and 400 females, respectively, as shown in Table 3. In addition, the edge computing server comes from Huawei's 1288H V5 with Intel C622. In particular, all experiments scenarios are simulated by the Roblox software; this is because it is unrealistic to equip the smartphone for each older adult or collect the inherent fall data from their smartphones.
Furthermore, two latest and systematic research from [39,41] which integrate the most cutting-edge techniques are considered as baselines, called InfS and MeaM, respectively. Meanwhile, we test two kinds of experiments. At first, the performance of classification is verified via evaluating recall ratio, precision ratio, and F1 value; then, three metrics, i.e., accurate rate, transmission delay, and stability are used to evaluate the efficiency of strategy.

Classification Verification
The experimental results on recall ratio, precision ratio, and F1 value are shown in Table 4, where F1 value is defined as follows: 2Re * Pre Re + Pre (29) where Re and Pre are the recall ratio and precision ration, respectively. We can find that EdCNN has the best recall ratio, precision ratio, and F1 value, followed by MeaM and InfS; this is because EdCNN comprehensively optimizes CNN via ReLU function, Maxout function, Softmax function, and cross entropy function. InfS only considers the boosted decision function to realize machine learning, which cannot reach the same classification effect with CNN, thus it has the worst classification result. Although both EdCNN and MeaM adopt CNN, MeaM does not make the systematic optimization while EdCNN presents the relatively optimal CNN structure (see Table 2) via the simulation experiments, thus MeaM has a worse classification effect than EdCNN.

Accurate Rate
The experimental results on accurate rate based on four different scenarios are shown in Table 5. We can find that EdCNN always all the highest accurate rate, followed by MeaM and InfS. There are two aspects of reasons. On one hand, EdCNN has the best classification effect. On the other hand, EdCNN uses BSA to optimize the trade-off between running delay and energy consumption, which guarantees that the obtained solution is relatively optimal. In addition, we can also find that it has the highest accurate rate in case of testing the home scenario; this is because such scenario has no external interference.

Transmission Delay
The experimental results on transmission delay based on four different scenarios are shown in Table 6. Similarly, we can find that EdCNN consumes the smallest transmission delay, followed by InfS and MeaM. At first, MeaM does not have the assistance via employing the networking paradigms. On the contrary, EdCNN and InfS use edge computing and cloud computing techniques, respectively, which solve the problem of limited storage resources and computation resources. As a result, MeaM spends the largest transmission delay. Then, for EdCNN and InfS, the proposed EdCNN shows two distinguished advantages. On one hand, it puts the computation tasks at the edge server, which is close to the users with smartphone, obtaining the small communication delay. On the other hand, it uses the SDN controller to collect the data from smartphone, which realizes the virtual data transmission with the relatively small delay. In summary, EdCNN takes smaller transmission delays than InfS.  Table 7, where the abnormal results mean that the relatively better strategy is the inferior position in terms of some simulation points. We can find that all corresponding p-values are smaller than the designated significance level 0.01, which indicates that the proposed EdCNN has better performance than InfS and MeaM. Furthermore, from the comprehensive evaluation perspective, the proposed EdCNN is stable and can be acceptable.

Conclusions
The remote monitoring of older adults and detecting dangers in the state of human health have become essential elements in smart cities. In terms of the three limitations of current fall detection strategies faced, i.e., (i) the data collection cannot reach the real-time level; (ii) the adopted detection methods cannot satisfy the enough stability, and (iii) the computation overhead of collection device is very high, this paper proposes a novel EdCNN to do fall detection, via leveraging CNN, SDN, and edge computing techniques.
Meanwhile, the global network view ability of SDN is used to collect the generated data from smartphone. In terms of edge computing scheduling on putting some computation tasks at the edge server, BSA is employed to solve the trade-off problem between running delay and energy consumption, which guarantees the relative optimal scheduling solution. In terms of CNN for data training, it is comprehensively optimized via ReLU function, Maxout function, Softmax function, and cross entropy function. In fact, these three sub-proposals guarantee to address the above-mentioned three limitations. To be specific, the SDN controller can ensure the real-time data collection; the enhanced CNN structure can make the stability; and the edge computing framework can decrease the computation overhead.
Finally, two kinds of simulation experiments are made. At first, the classification performance is verified via evaluating recall ratio, precision ratio, and F1 value, which can reach 98.26%, 97.87%, and 97.49%, respectively. Then, the whole performance is evaluated by testing accurate rate, transmission delay, and stability. In particular, the accurate rate and transmission delay can reach 99.01% and 19.261 ms, respectively, in case of a testing home scenario. The experimental results show that EdCNN outperforms two baselines.
However, as a novel method based on SDN, edge computing, and CNN, the proposed EdCNN also has some limitations. At first, we do not consider the application types, that is, the data division is not completed in the fine-grained way. Then, although CNN computing and edge computing decrease the communication delay, they introduce the computation overhead. Finally, the experiment environment is stalled at the simulation platform, irrespective of the real data collection from some persons. In the future, we plan to enhance the performance of EdCNN from two aspects. On one hand, we improve EdCNN around the above-mentioned three limitations. On the other hand, we improve BSA and reach much faster convergence. In addition, we will study the relevance of CNN in different domains and security issues like [53,54].