Real-Time Information Derivation from Big Sensor Data via Edge Computing
Abstract
:1. Introduction
- We support dynamic rate adaptation for the periodic sensor data transfer from IoT devices to the edge server based on the relative data importance provided by different sensors/IoT devices to optimize the total utility, i.e., the sum of the importance values of the sensor data transferred from sensors to the edge server, subject to the total rate upper bound. The proposed transfer rate adaptation scheme is generic in that it can support a different data importance metric depending on a specific real-time sensor data analytics application.
- Using the API (Application Programming Interface) of RTMR, an application developer can write serial map() and reduce() functions for a specific real-time data analysis application, and specify the data analysis task parameters, e.g., the deadlines and periods.
- A non-preemptive periodic real-time task model is supported for periodic real-time analysis of sensor data. Moreover, a schedulability test for the EDF (Earliest Deadline First) scheduling algorithm is provided to support timing constraints considering both the computation and data access delay.
- Several mechanisms for efficient in-memory sensor data analysis are supported. First, sensor data are directly streamed into main memory to let RTMR derive information from them on the fly. Second, intermediate data generated in a map/reduce phase is pipelined straight to the next phase, if any, without being staged in the local disk or distributed file system unlike Hadoop and its variants. Further, memory reservation is supported to ensure enough space is allocated to store the input, intermediate, and output data for each real-time sensor data analysis task.
2. System Overview
2.1. Map-Reduce Basics
2.2. Overall System Structure
2.3. Adaptive Data Transfer Rate Allocation to IoT Devices based on Data Importance
2.4. Real-Time Sensor Data Analytics
- In RTMR, each input sensor datum is expressed as a (key, value) pair, e.g., (cell ID, phone number) for location-based services, and streamed into memory. The input (key, value) pairs are evenly divided into chunks by the MR engine and assigned to mappers, i.e., worker threads.
- The mappers independently execute the user-specified map() function on different data chunks in parallel. For example, each mapper executes the map() function for cellphone count in Figure 5, producing intermediate (key, value) pairs as a result.
- The map phase is completed when all the mappers finish processing the assigned data chunks. If there is no reduce phase, which is optional, the (key, value) pairs produced by the mappers are returned as the final result and the job is terminated.
- If there is a reduce phase, the intermediate (key, value) pairs produced by the mappers are directly pipelined to one or more reducers and sorted based on their keys. Specifically, the pointers to the intermediate results in memory are passed to the reducers with no expensive data copies.
- The reducers execute the user-defined reduce() function in parallel to produce the final (key, value) pairs by processing the assigned non-overlapping intermediate (key, value) pairs. When all the reducers complete, the final (key, value) pairs are returned and the job is terminated. In an iterative application that consists of multiple pairs of map and reduce phases, the output of the reduce phase is directly pipelined to the map phase of the next iteration by passing the pointers to the data.
3. Adaptive Rate Allocation
3.1. Data Importance
- Application-specific data importance: The relative importance levels of the sensor data sources can be determined using predefined criteria to describe events of interest in a specific application. For example, in visual surveillance, the number of detected human faces or objects of interest (e.g., weapons) can be used as the data importance metric to support crowd counting or tracking. In [25], data related to abnormal events are considered important in an underwater wireless sensor network. Zhag et al. [26] apply the data importance concept to eliminate redundant observations made by multiple surveillance cameras. In [27], visual data in a squash game is analyzed. The frequency of future events, where an event is defined to be the squash ball hitting a specific segment of the walls, is predicted to allocate more bandwidth to the cameras expected to observe more events. Also, in [28], data generating more profits are considered more important and further replicated in a cloud data storage system. Although these approaches leverage the data importance concept, none of them leverages edge computing to mitigate the challenges for real-time sensor data analytics critical in the emerging IoT era. To bridge the gap, RTMR running on the edge server dynamically allocates data transfer rates to IoT devices based on data importance, while scheduling real-time data analytics tasks to meet their timing constraints at the edge.
- Application-agnostic data importance: The data similarity concept [25,29] is not tied to a specific application. In general, consecutive sensor data, e.g., temperature/pressure readings or surveillance images, might be similar. Usually, a data similarity check is inexpensive in terms of computation; therefore, an IoT device can perform it for itself and transfers the data only if the difference between the current and previous data is more than a specified threshold, e.g., 5%. In addition, Hu et al. [30] propose a novel method, called offload shaping, to allow an IoT device to drop blurry images, i.e., low quality data, via some additional cheap computation. Other metrics, e.g., image resolution or sensor calibration, could be used to estimate the sensor data quality and importance accordingly.
3.2. Problem Formulation
3.3. Dynamic Transfer Rate Allocation
Algorithm 1: Dynamic Transfer Rate Allocation at the Control Period |
4. Real-Time Task Model and Scheduling
4.1. Task Model and Memory Reservation
4.2. Schedulability Test
5. Performance Evaluation
5.1. Cost-Effective Rate Allocation to IoT Devices
5.2. Scheduling Real-Time Data Analysis Tasks
- Histogram (HG): A histogram is a fundamental method for a graphical representation of any data distribution. In this paper, we consider image histograms that plot the number of pixels for each tonal value to support fundamental analysis in data-intensive real-time applications, e.g., cognitive assistance, traffic control, or visual surveillance. (HG is not limited to image data but generally applicable to the other types of data, e.g., sensor readings.) The input of this periodic task is a large image with pixels per task period. The input data size processed per period is approximately 1.4 GB.
- Linear Regression (LR): Linear regression is useful for real-time data analytics. For example, it can be applied to predict sensor data values via time series analysis. In LR, points in two dimensional space, totaling 518 MB, are used as the input per task period to model the approximately linear relation between x and y via LR.
- Matrix Multiplication (MM): MM is heavily used in various big data and IoT applications, such as cognitive assistance, autonomous driving, and scientific applications. In this paper, MM multiplies two matrices together per task period. Each input matrix is 16 MB. The output matrix is 16 MB too.
- K-means clustering (KM): This is an important data mining algorithm for clustering. For example, it can be used to cluster mobile users based on their locations for real-time location-based services. It partitions ℓ observations into k clusters (usually ) such that each observation belongs to the cluster with the nearest mean. The input of the k-means task is points in two dimensional space, totaling 77 MB, per task period.
6. Related Work
7. Conclusions and Future Work
Acknowledgments
Author Contributions
Conflicts of Interest
Abbreviations
ECG | Electrocardiogram |
EDF | Earliest Deadline First |
E2E | End-to-End |
FIFO | First-In First-Out |
fps | frames per second |
ILP | Integer Linear Programming |
IoT | Internet of Things |
JPEG | Joint Photographic Experts Group |
mHealth | mobile Health |
RTMR | Real-Time Map-Reduce |
SIMD | Single Instruction Multiple Data |
HG | Histogram |
KM | k-means |
LR | Linear Regression |
MM | Matrix Multiplication |
References
- Gartner, Internet of Things. Available online: http://www.zdnet.com/article/iot-devices-will-outnumber-the-worlds-population-this-year-for-the-first-time/ (accessed on 16 October 2017).
- Ahmeda, E.; Rehmani, M.H. Mobile Edge Computing: Opportunities, solutions, and challenges. Future Gener. Comput. Syst. 2016, 70, 59–63. [Google Scholar] [CrossRef]
- Satyanarayanan, M. The Emergence of Edge Computing. IEEE Comput. 2017, 50, 30–39. [Google Scholar] [CrossRef]
- Bird, R.; Wadler, P. Introduction to Functional Programming, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1998. [Google Scholar]
- Dean, J.; Ghemawat, J. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the Symposium on Operating Systems Design and Implementation, Seattle, WA, USA, 4 October 2004. [Google Scholar]
- Hadoop Project. Available online: http://hadoop.apache.org (accessed on 16 October 2017).
- Beluke, D. Big Data Impacts Data Management: The 5 Vs of Big Data. Available online: http://davebeulke.com/big-data-impacts-data-management-the-five-vs-of-big-data/ (accessed on 16 October 2017).
- Apache Storm. Available online: https://storm.apache.org/ (accessed on 16 October 2017).
- S4: Distributed Stream Computing Platform. Available online: http://incubator.apache.org/s4/ (accessed on 16 October 2017).
- Spark Streaming. Available online: https://spark.apache.org/streaming/ (accessed on 16 October 2017).
- Basu, A. Q Learning Based Workflow Scheduling in Hadoop. Int. J. Appl. Eng. Res. 2017, 12, 3311–3317. [Google Scholar]
- Phan, L.T.X.; Zhang, Z.; Zheng, Q.; Loo, B.T.; Lee, I. An Empirical Analysis of Scheduling Techniques for Real-time Cloud-based Data Processing. In Proceedings of the International Workshop on Service-Oriented Computing and Applications, Irvine, CA, USA, 12–14 December 2011. [Google Scholar]
- Kc, K.; Anyanwu, K. Scheduling Hadoop Jobs to Meet Deadlines. In Proceedings of the International Conference on Cloud Computing Technology and Science, Washington, DC, USA, 30 November–3 December 2010. [Google Scholar]
- Teng, F.; Yang, H.; Li, T.; Yang, Y.; Li, Z. Scheduling real-time workflow on MapReduce-based cloud. In Proceedings of the International Conference on Innovative Computing Technology, State College, Pennsylvania, PA, USA, 8–10 May 2013. [Google Scholar]
- Li, S.; Hu, S.; Wang, S.; Su, L.; Abdelzaher, T.; Gupta, I.; Pace, R. WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Cluster. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems (ICDCS), Madrid, Spain, 30 June–3 July 2014. [Google Scholar]
- Rashmi, S.; Basu, A. Deadline constrained Cost Effective Workflow scheduler for Hadoop clusters in cloud datacenter. In Proceedings of the International Conference on Computation System and Information Technology for Sustainable Solutions, Bangalore, India, 6–8 October 2016. [Google Scholar]
- Tamrakar, K.; Yazidi, A.; Haugerud, H. Cost Efficient Batch Processing in Amazon Cloud with Deadline Awareness. In Proceedings of the IEEE International Conference on Advanced Information Networking and Applications (AINA), Taipei, Taiwan, 27–29 March 2017. [Google Scholar]
- Lu, C.; Saifullah, A.; Li, B.; Sha, M.; Gonzalez, H.; Gunatilaka, D.; Wu, C.; Nie, L.; Chen, Y. Real-Time Wireless Sensor-Actuator Networks for Industrial Cyber-Physical Systems. Proc. IEEE 2016, 104. [Google Scholar] [CrossRef]
- Stankovic, J.A.; Abdelzaher, T.F.; Lu, C.; Sha, L.; Hou, J.C. Real-time Communication and Coordination in Embedded Sensor Networks. Proc. IEEE 2003, 91. [Google Scholar] [CrossRef]
- Phoenix. Available online: https://github.com/kozyraki/phoenix (accessed on 17 October 2017).
- Chen, L.; Kang, K.D. A Framework for Real-Time Information Derivation from Big Sensor Data. In Proceedings of the IEEE International Conference on Embedded Software and System (ICESS), New York, NY, USA, 24–26 August 2015. [Google Scholar]
- Fog Computing and the Internet of Things: Extend the Cloud to Where the Things Are. Available online: https://www.cisco.com/c/dam/en_us/solutions/trends/iot/docs/computing-overview.pdf (accessed on 16 October 2017).
- Edge Computing. Available online: https://www.rtinsights.com/category/edge-computing/ (accessed on 16 October 2017).
- Liu, J.W.S. Real-Time Systems; Prentice Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
- Cheng, C.F.; Lia, L.H. Data gathering problem with the data importance consideration in Underwater Wireless Sensor Networks. J. Netw. Comput. Appl. 2016. [Google Scholar] [CrossRef]
- Zhang, T.; Chowdhery, A.; Bahl, P.; Jamieson, K.; Banerjee, S. The Design and Implementation of a Wireless Video Surveillance System. In Proceedings of the Annual International Conference on Mobile Computing and Networking (MobiCom), Paris, France, 7–11 September 2015. [Google Scholar]
- Toka, L.; Lajtha, A.; Hosszu, E.; Formanek, B.; Gehberger, D.; Tapolcai, J. A Resource-Aware and Time-Critical IoT Framework. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM), Atlanta, GA, USA, 1–4 May 2017. [Google Scholar]
- Andronikou, V.; Mamouras, K.; Tserpes, K.; Kyriazis, D.; Varvarigou, T. Dynamic QoS-aware data replication in grid environments based on data importance. Future Gener. Comput. Syst. 2012, 28. [Google Scholar] [CrossRef]
- Ho, S.J.; Kuo, T.W.; Mok, A.K. Similarity-based Load Adjustment for Real-Time Data-Intensive Applications. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS), San Francisco, CA, USA, 2–5 December 1997. [Google Scholar]
- Hu, W.; Amos, B.; Chen, Z.; Ha, K.; Richter, W.; Pillai, P.; Gilbert, B.; Harkes, J.; Satyanarayanan, M. The Case for Offload Shaping. In Proceedings of the International Workshop on Mobile Computing Systems and Applications (HotMobile), Santa Fe, NM, USA, 12–13 February 2015. [Google Scholar]
- Steele, J. The Cauchy Schwarz Master Class; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Carpenter, J.; Funk, S.; Holman, P.; Srinivasan, A.; Anderson, J.; Baruah, S. A Categorization of Real-time Multiprocessor Scheduling Problems and Algorithms. In Handbook of Scheduling: Algorithms, Models, and Performance Analysis; Leung, J.Y.T., Ed.; Chapman Hall: Eugene, OR, USA; CRC Press: Boca Raton, FL, USA, 2003. [Google Scholar]
- Jeffay, K.; Stanat, D.; Martel, C.U. On Non-Preemptive Scheduling of Periodic and Sporadic Tasks. In Proceedings of the Real-Time Systems Symposium, San Antonio, TX, USA, 4–6 December 1991. [Google Scholar]
- Saifullah, A.; Ferry, D.; Li, J.; Agrawal, K.; Lu, C.; Gill, C.D. Parallel Real-Time Scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 3242–3252. [Google Scholar] [CrossRef]
- Li, J.; Chen, J.J.; Agrawal, K.; Lu, C.; Gill, C.D.; Saifullah, A. Analysis of Federated and Global Scheduling for Parallel Real-Time Tasks. In Proceedings of the Euromicro Conference on Real-Time Systems, Madrid, Spain, 8–11 July 2014. [Google Scholar]
- OpenCV. Available online: http://opencv.org/ (accessed on 16 October 2017).
- BUSECURE. Available online: http://www.binghamton.edu/its/organization/ops/wireless.html (accessed on 16 October 2017).
- Gao, W.; Tian, Y.; Huang, T.; Ma, S.; Zhang, X. The IEEE 1857 Standard: Empowering Smart Video Surveillance Systems. IEEE Intell. Syst. 2014, 29. [Google Scholar] [CrossRef]
- Liu, H.; Chen, S.; Kubota, N. Intelligent Video Systems and Analytics: A Survey. IEEE Trans. Ind. Inf. 2013, 9, 1222–1233. [Google Scholar]
- Wei, Y.H.; Leng, Q.; Han, S.; Mok, A.K.; Zhang, W.; Tomizuka, M. RT-WiFi: Real-Time High-Speed Communication Protocol for Wireless Cyber-Physical Control Applications. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS), Vancouver, BC, Canada, 3–6 December 2013. [Google Scholar]
- Leng, Q.; Wei, Y.H.; Han, S.; Mok, A.K.; Zhang, W.; Tomizuka, M. Improving Control Performance by Minimizing Jitter in RT-WiFi Networks. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS), Rome, Italy, 2–5 December 2014. [Google Scholar]
- Jacob, R.; Zimmerling, M.; Huang, P.; Beutel, J.; Thiele, L. End-to-end Real-time Guarantees in Wireless Cyber-physical Systems. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS), Porto, Portugal, 29 November–2 December 2016. [Google Scholar]
- Duquennoy, S.; Nahas, B.A.; Landsiedel, O.; Watteyne, T. Orchestra: Robust Mesh Networks Through Autonomously Scheduled TSCH. In Proceedings of the ACM Conference on Embedded Networked Sensor Systems (SenSys), Seoul, Korea, 1–4 November 2015. [Google Scholar]
- Watteyne, T.; Handziski, V.; Vilajosana, X.; Duquennoy, S.; Hahm, O.; Baccelli, E.; Wolisz, A. Industrial Wireless IP-Based Cyber Physical Systems. Proc. IEEE 2016, 104. [Google Scholar] [CrossRef]
- Lee, K.H.; Lee, Y.J.; Choi, H.; Chung, Y.D.; Moon, B. Parallel Data Processing with MapReduce: A Survey. SIGMOD Rec. 2011, 40, 11–20. [Google Scholar] [CrossRef]
- Bu, Y.; Howe, B.; Balazinska, M.; Ernst, M.D. HaLoop: Efficient Iterative Data Processing on Large Clusters. Proc. VLDB Endow. 2010, 3, 285–296. [Google Scholar] [CrossRef]
- Basanta-Val, P.; Fernandez-Garcia, N.; Sanchez-Fernandez, L.; Fisteus, J.A. Patterns for Distributed Real-Time Stream Processing. IEEE Trans. Parallel Distrib. Syst. 2017, 28. [Google Scholar] [CrossRef]
- In-Memory MapReduce - Apache Ignite. Available online: https://ignite.apache.org/features/mapreduce.html (accessed on 16 October 2017).
- Hazelcast. Available online: https://github.com/hazelcast/hazelcast/tree/master/hazelcast/src/main/java/com/hazelcast/mapreduce (accessed on 16 October 2017).
- Backman, N.; Pattabiraman, K.; Fonseca, R.; Cetintemel, U. C-MR: Continuously Executing MapReduce Workflows on Multi-core Processors. In Proceedings of the International Workshop on MapReduce and Its Applications, Delft, The The Netherlands, 18–19 June 2012. [Google Scholar]
- RAMCloud. Available online: https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud (accessed on 16 October 2017).
- Davis, R.I.; Burns, A. A Survey of Hard Real-time Scheduling for Multiprocessor Systems. ACM Comput. Surv. 2011, 43. [Google Scholar] [CrossRef]
Smart Phone Model | E2E Latency (ms) |
---|---|
Huawei Honor 6 | 100 |
Galaxy S6 | 86 |
LG G3 | 98 |
Google Nexus 6 | 93 |
Google Nexus 3 | 98 |
m = 1 | m = 2 | m = 4 | m = 8 | m = 16 | m = 30 | |
---|---|---|---|---|---|---|
HG | 2.41 s | 1.67 s | 0.88 s | 0.56 s | 0.33 s | 0.2 s |
LR | 1.49 s | 1.3 s | 1.18 s | 0.95 s | 0.62 s | 0.37 s |
MM | 19.7 s | 11.2 s | 5.9 s | 3.73 s | 2.02 s | 1.11 s |
KM | 10.2 s | 7.5 s | 3.72 s | 3.09 s | 2.54 s | 2.36 s |
HG | LR | MM | KM | |
---|---|---|---|---|
23 s | 22 s | 30 s | 25 s | |
13 s | 15 s | 22 s | 18 s | |
7 s | 8 s | 12 s | 10 s | |
4.5 s | 5 s | 7 s | 6 s | |
3 s | 4 s | 5 s | 6 s | |
2.6 s | 3 s | 4 s | 5 s |
m = 1 | m = 2 | m = 4 | m = 8 | m = 16 | m = 30 | |
---|---|---|---|---|---|---|
yes | yes | yes | yes | yes | yes | |
no | yes | yes | yes | yes | yes | |
no | no | yes | yes | yes | yes | |
no | no | no | yes | yes | yes | |
no | no | no | no | yes | yes | |
no | no | no | no | no | yes |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, K.-D.; Chen, L.; Yi, H.; Wang, B.; Sha, M. Real-Time Information Derivation from Big Sensor Data via Edge Computing. Big Data Cogn. Comput. 2017, 1, 5. https://doi.org/10.3390/bdcc1010005
Kang K-D, Chen L, Yi H, Wang B, Sha M. Real-Time Information Derivation from Big Sensor Data via Edge Computing. Big Data and Cognitive Computing. 2017; 1(1):5. https://doi.org/10.3390/bdcc1010005
Chicago/Turabian StyleKang, Kyoung-Don, Liehuo Chen, Hyungdae Yi, Bin Wang, and Mo Sha. 2017. "Real-Time Information Derivation from Big Sensor Data via Edge Computing" Big Data and Cognitive Computing 1, no. 1: 5. https://doi.org/10.3390/bdcc1010005
APA StyleKang, K.-D., Chen, L., Yi, H., Wang, B., & Sha, M. (2017). Real-Time Information Derivation from Big Sensor Data via Edge Computing. Big Data and Cognitive Computing, 1(1), 5. https://doi.org/10.3390/bdcc1010005