Synthesizing and Reconstructing Missing Sensory Modalities in Behavioral Context Recognition
Abstract
:1. Introduction
- Demonstration of a method to restore missing sensory modalities using an adversarial autoencoder.
- Systematic comparison with other techniques to impute lost data.
- Leveraging learned embedding and extending the autoencoder for multi-label context recognition.
- Generating synthetic multimodal data and its empirical evaluation through visual fidelity of samples and classification performance on a real test set.
2. Related Work
3. ExtraSensory Dataset
4. Methodology
4.1. Autoencoder
4.2. Adversarial Autoencoder
- The encoder and decoder networks are trained simultaneously to minimize the reconstruction objective (see Equation (6)). Additionally, the class label information with latent code can also be provided to the decoder as supervision. Thus, the decoder then uses both and label information to reconstruct the input. In addition, conditioning over enables the decoder to produce class conditional samples.
- The discriminator network is then trained to distinguish between true samples from a prior distribution and fake data points () generated by an encoder.
- Subsequently, the encoder, whose goal is to deceive the discriminator by minimizing a separate loss function, is updated.
4.3. Context Classification
4.4. Model Architecture and Training
4.5. Implementation
4.6. Performance Evaluation
5. Experimental Results
5.1. Modality Reconstruction
5.2. Classification with Adversarial Autoencode Representations
5.3. Context Recognition with Several Missing Modalities
5.4. Generating Realistic Multimodal Data
6. Discussion and Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Rashidi, P.; Mihailidis, A. A survey on ambient-assisted living tools for older adults. IEEE J. Biomed. Health Inform. 2013, 17, 579–590. [Google Scholar] [CrossRef] [PubMed]
- Nahum-Shani, I.; Smith, S.N.; Tewari, A.; Witkiewitz, K.; Collins, L.M.; Spring, B.; Murphy, S. Just in Time Adaptive Interventions (JITAIs): An Organizing Framework for Ongoing Health Behavior Support; Methodology Center Technical Report; The Methodology Center: University Park, PA, USA, 2014; pp. 14–126. [Google Scholar]
- Avci, A.; Bosch, S.; Marin-Perianu, M.; Marin-Perianu, R.; Havinga, P. Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: A survey. In Proceedings of the 23th International Conference on Architecture of Computing Systems, Hannover, Germany, 22–23 February 2010; pp. 1–10. [Google Scholar]
- Rabbi, M.; Aung, M.H.; Zhang, M.; Choudhury, T. MyBehavior: Automatic personalized health feedback from user behaviors and preferences using smartphones. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, 7–11 September 2015; pp. 707–718. [Google Scholar]
- Althoff, T.; Hicks, J.L.; King, A.C.; Delp, S.L.; Leskovec, J. Large-scale physical activity data reveal worldwide activity inequality. Nature 2017, 547, 336–339. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Joshua, L.; Varghese, K. Accelerometer-based activity recognition in construction. J. Comput. Civ. Eng. 2010, 25, 370–379. [Google Scholar] [CrossRef]
- Dey, A.K.; Wac, K.; Ferreira, D.; Tassini, K.; Hong, J.H.; Ramos, J. Getting closer: An empirical investigation of the proximity of user to their smart phones. In Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, 17–21 September 2011; pp. 163–172. [Google Scholar]
- Vaizman, Y.; Ellis, K.; Lanckriet, G. Recognizing Detailed Human Context in the Wild from Smartphones and Smartwatches. IEEE Pervas. Comput. 2017, 16, 62–74. [Google Scholar] [CrossRef] [Green Version]
- Kang, H. The prevention and handling of the missing data. Korean J. Anesthesiol. 2013, 64, 402–406. [Google Scholar] [CrossRef] [PubMed]
- Gelman, A.; Hill, J. Missing-data imputation. In Data Analysis Using Regression and Multilevel/Hierarchical Models; Analytical Methods for Social Research; Cambridge University Press: Cambridge, UK, 2006; pp. 529–544. [Google Scholar]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; NIPS: La Jolla, CA, USA, 2014; pp. 2672–2680. [Google Scholar]
- Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv, 2015; arXiv:1511.05644. [Google Scholar]
- Guiry, J.J.; Van de Ven, P.; Nelson, J. Multi-sensor fusion for enhanced contextual awareness of everyday activities with ubiquitous devices. Sensors 2014, 14, 5687–5701. [Google Scholar] [CrossRef] [PubMed]
- Wang, A.; Chen, G.; Shang, C.; Zhang, M.; Liu, L. Human activity recognition in a smart home environment with stacked denoising autoencoders. In Proceedings of the International Conference on Web-Age Information Management, Nanchang, China, 3–5 June 2016; pp. 29–40. [Google Scholar]
- Li, Y.; Shi, D.; Ding, B.; Liu, D. Unsupervised feature learning for human activity recognition using smartphone sensors. In Mining Intelligence and Knowledge Exploration; Springer: Berlin, Germany, 2014; pp. 99–107. [Google Scholar]
- Plötz, T.; Hammerla, N.Y.; Olivier, P. Feature learning for activity recognition in ubiquitous computing. In Proceedings of the IJCAI Proceedings—International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011; Volume 22, p. 1729. [Google Scholar]
- Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor-based activity recognition: A survey. arXiv, 2017; arXiv:1707.03502. [Google Scholar] [CrossRef]
- Ding, M.; Fan, G. Multilayer Joint Gait-Pose Manifolds for Human Gait Motion Modeling. IEEE Trans. Cybern. 2015, 45, 2413–2424. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Ding, M.; Fan, G. Video-based human walking estimation using joint gait and pose manifolds. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 1540–1554. [Google Scholar] [CrossRef]
- Chen, C.; Jafari, R.; Kehtarnavaz, N. A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools Appl. 2017, 76, 4405–4425. [Google Scholar] [CrossRef]
- Vaizman, Y.; Weibel, N.; Lanckriet, G. Context Recognition In-the-Wild: Unified Model for Multi-Modal Sensors and Multi-Label Classification. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 1, 168. [Google Scholar] [CrossRef]
- Thompson, B.B.; Marks, R.; El-Sharkawi, M.A. On the contractive nature of autoencoders: Application to missing sensor restoration. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; Volume 4, pp. 3011–3016. [Google Scholar]
- Nelwamondo, F.V.; Mohamed, S.; Marwala, T. Missing data: A comparison of neural network and expectation maximization techniques. arXiv, 2007; arXiv:0704.3474v1. [Google Scholar]
- Duan, Y.; Lv, Y.; Kang, W.; Zhao, Y. A deep learning based approach for traffic data imputation. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 912–917. [Google Scholar]
- Beaulieu-Jones, B.K.; Moore, J.H. Missing data imputation in the electronic health record using deeply learned autoencoders. In Proceedings of the Pacific Symposium on Biocomputing 2017, Big Island of Hawaii, HI, USA, 3–7 January 2017; World Scientific: Singapore, 2017; pp. 207–218. [Google Scholar]
- Jaques, N.; Taylor, S.; Sano, A.; Picard, R. Multimodal Autoencoder: A Deep Learning Approach to Filling in Missing Sensor Data and Enabling Better Mood Prediction. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017. [Google Scholar]
- Li, J.; Struzik, Z.; Zhang, L.; Cichocki, A. Feature learning from incomplete EEG with denoising autoencoder. Neurocomputing 2015, 165, 23–31. [Google Scholar] [CrossRef] [Green Version]
- Miotto, R.; Li, L.; Kidd, B.A.; Dudley, J.T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 2016, 6, 26094. [Google Scholar] [CrossRef] [PubMed]
- Martinez, H.P.; Bengio, Y.; Yannakakis, G.N. Learning deep physiological models of affect. IEEE Comput. Intell. Mag. 2013, 8, 20–33. [Google Scholar] [CrossRef]
- Deng, J.; Xu, X.; Zhang, Z.; Frühholz, S.; Schuller, B. Universum autoencoder-based domain adaptation for speech emotion recognition. IEEE Signal Process. Lett. 2017, 24, 500–504. [Google Scholar] [CrossRef]
- Kuchaiev, O.; Ginsburg, B. Training Deep AutoEncoders for Collaborative Filtering. arXiv, 2017; arXiv:1708.01715. [Google Scholar]
- Yu, L.; Zhang, W.; Wang, J.; Yu, Y. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of the AAAI, San Francisco, CA, USA, 4–9 February 2017; pp. 2852–2858. [Google Scholar]
- Choi, E.; Biswal, S.; Malin, B.; Duke, J.; Stewart, W.F.; Sun, J. Generating multi-label discrete electronic health records using generative adversarial networks. arXiv, 2017; arXiv:1703.06490. [Google Scholar]
- Esteban, C.; Hyland, S.L.; Rätsch, G. Real-valued (medical) time series generation with recurrent conditional GANs. arXiv, 2017; arXiv:1706.02633. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- Nam, J.; Kim, J.; Mencía, E.L.; Gurevych, I.; Fürnkranz, J. Large-scale multi-label text classification–revisiting neural networks. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Nancy, France, 15–19 September 2014; Springer: Berlin, Germany, 2014; pp. 437–452. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. OSDI 2016, 16, 265–283. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv, 2014; arXiv:1412.6980. [Google Scholar]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
Modality | # of Features | PCA | AAE |
---|---|---|---|
Accelerometer (Acc) | 26 | 1.104 ± 0.075 | 0.104 ± 0.016 |
Gyroscope (Gyro) | 26 | 1.423 ± 0.967 | 0.686 ± 1.291 |
WAccelerometer (WAcc) | 46 | 1.257 ± 0.007 | 0.147 ± 0.003 |
Location (Loc) | 6 | 0.009 ± 0.003 | 0.009 ± 0.003 |
Audio (Aud) | 28 | 1.255 ± 0.015 | 0.080 ± 0.006 |
Phone State (PS) | 34 | 0.578 ± 0.000 | 0.337 ± 0.011 |
(a) Missing: Gyro, WAcc, Loc and Aud | ||||
BA | Sensitivity | Specificity | Accuracy | |
AAE | 0.713 ± 0.008 | 0.711 ± 0.021 | 0.716 ± 0.021 | 0.716 ± 0.024 |
PCA | 0.526 ± 0.007 | 0.249 ± 0.040 | 0.802 ± 0.041 | 0.825 ± 0.034 |
Mean | 0.669 ± 0.023 | 0.548 ± 0.056 | 0.791 ± 0.025 | 0.785 ± 0.022 |
Median | 0.657 ± 0.015 | 0.502 ± 0.045 | 0.812 ± 0.022 | 0.808 ± 0.017 |
Fill -1 | 0.519 ± 0.004 | 0.175 ± 0.012 | 0.862 ± 0.004 | 0.857 ± 0.013 |
(b) Missing: WAcc, Loc and Aud | ||||
BA | Sensitivity | Specificity | Accuracy | |
AAE | 0.723 ± 0.007 | 0.729 ± 0.017 | 0.718 ± 0.013 | 0.721 ± 0.014 |
PCA | 0.549 ± 0.02 | 0.255 ± 0.052 | 0.842 ± 0.013 | 0.847 ± 0.019 |
Mean | 0.682 ± 0.017 | 0.567 ± 0.04 | 0.797 ± 0.014 | 0.79 ± 0.014 |
Median | 0.678 ± 0.014 | 0.543 ± 0.028 | 0.814 ± 0.005 | 0.806 ± 0.004 |
Fill -1 | 0.547 ± 0.016 | 0.209 ± 0.087 | 0.885 ± 0.055 | 0.836 ± 0.047 |
(c) Missing: WAcc and Loc | ||||
BA | Sensitivity | Specificity | Accuracy | |
AAE | 0.722 ± 0.010 | 0.704 ± 0.029 | 0.74 ± 0.018 | 0.742 ± 0.020 |
PCA | 0.568 ± 0.012 | 0.300 ± 0.038 | 0.835 ± 0.016 | 0.856 ± 0.010 |
Mean | 0.735 ± 0.011 | 0.678 ± 0.028 | 0.793 ± 0.009 | 0.789 ± 0.008 |
Median | 0.727 ± 0.012 | 0.653 ± 0.035 | 0.801 ± 0.020 | 0.796 ± 0.020 |
Fill -1 | 0.564 ± 0.026 | 0.270 ± 0.064 | 0.859 ± 0.012 | 0.840 ± 0.008 |
BA | Sensitivity | Specificity | Accuracy | |
---|---|---|---|---|
Real | 0.753 ± 0.011 | 0.762 ± 0.014 | 0.745 ± 0.016 | 0.749 ± 0.015 |
TSTR | 0.715 ± 0.011 | 0.731 ± 0.035 | 0.700 ± 0.036 | 0.705 ± 0.034 |
TRTS | 0.700 ± 0.020 | 0.656 ± 0.035 | 0.744 ± 0.033 | 0.744 ± 0.030 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saeed, A.; Ozcelebi, T.; Lukkien, J. Synthesizing and Reconstructing Missing Sensory Modalities in Behavioral Context Recognition. Sensors 2018, 18, 2967. https://doi.org/10.3390/s18092967
Saeed A, Ozcelebi T, Lukkien J. Synthesizing and Reconstructing Missing Sensory Modalities in Behavioral Context Recognition. Sensors. 2018; 18(9):2967. https://doi.org/10.3390/s18092967
Chicago/Turabian StyleSaeed, Aaqib, Tanir Ozcelebi, and Johan Lukkien. 2018. "Synthesizing and Reconstructing Missing Sensory Modalities in Behavioral Context Recognition" Sensors 18, no. 9: 2967. https://doi.org/10.3390/s18092967