Low Frequency Vibration Visual Monitoring System Based on Multi-Modal 3DCNN-ConvLSTM
Abstract
:1. Introduction
- We propose to use 3D convolution to extract spatial and temporal features from the vibration video stream in the field of vibration monitoring for the first time. Compared with the traditional vibration feature extraction method, the feature extractor operates in both spatial and temporal dimensions to capture the vibration features from raw data automatically without depending on the signal processing techniques in the video stream.
- We implement the network corresponding to vibration signal characteristics to realize frequency classification and monitoring. In the low frequency vibration range, the 3DCNN and ConvLSTM network architecture can effectively learn the spatial-temporal characteristics of muti-frequency with both global and local features. 3DCNN is used to extract the short-term spatiotemporal feature. The ConvLSTM structure learns the long-term spatiotemporal feature information.
- The method we propose is non-invasive and has no special restrictions on the monitoring environment. In order to reduce the interference factors such as ambient light and meet more comprehensive vibration monitoring as far as possible, we set the depth mode as the auxiliary of the color mode, and improve the performance of vibration monitoring through multi-modal fusion. The experimental results show that the method is superior to the single-modal structure.
2. Materials and Methods
2.1. Vibration Signal Acquisition
2.2. Networks for Vibration Signal
2.3. Model of Muti-Modal Networks
3. Experiments
3.1. Experimental Setup and Dataset Description
3.2. Experiments and Results
4. Discussion and Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Iwabuki, H.; Fukada, S.; Osafune, T. Contribution of large-vehicle vibration and bridge vibration to low-frequency noise generation characteristics. Appl. Acoust. 2019, 155, 150–166. [Google Scholar] [CrossRef]
- Erkal, A. Impact of Traffic-Induced Vibrations on Residential Buildings and Their Occupants in Metropolitan Cities. Promet Traffic Transp. 2019, 31, 271–285. [Google Scholar] [CrossRef]
- Beard, G.F.; Griffin, M.J. Discomfort of seated persons exposed to low frequency lateral and roll oscillation: Effect of backrest height. Appl. Ergon. 2016, 54, 51–56. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Busca, G.; Cigada, A.; Mazzoleni, P. Vibration Monitoring of Multiple Bridge Points by Means of a Unique Vision-Based Measuring System. Exp. Mech. 2014, 54, 255–271. [Google Scholar] [CrossRef]
- Shang, Z.; Shen, Z. Multi-point vibration measurement and mode magnification of civil structures using video-based motion processing. Autom. Constr. 2018, 93, 231–240. [Google Scholar] [CrossRef]
- Zhang, J.; Yang, Z.; Deng, H. Dynamic Visual Measurement of Driver Eye Movements. Sensors 2019, 19, 2217. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Scovanner, P.; Ali, S.; Shah, M. A 3-Dimensional SIFT Descriptor and its Application to Action Recognition. In Proceedings of the ACM International Multimedia Conference and Exhibition, Augsburg, Germany, 24–29 September 2007; pp. 357–360. [Google Scholar]
- Liu, J.; Yang, X. Learning to See the Vibration: A Neural Network for Vibration Frequency Prediction. Sensors 2018, 18, 2530. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lyu, C.; Qi, H.; Bai, Y. Non-contact low-frequency vibration Rapid Measurement Based on hue-height mapping. Measurement 2019, 151, 107113. [Google Scholar] [CrossRef]
- Poudel, U.; Fu, G.; Ye, J. Structural damage detection using digital video imaging technique and wavelet transformation. J. Sound Vib. 2005, 286, 869–895. [Google Scholar] [CrossRef]
- Lara, O.D.; Labrador, M. A Survey on Human Activity Recognition using Wearable Sensors. IEEE Commun. Surv. Tutor. 2013, 15, 1192–1209. [Google Scholar] [CrossRef]
- He, X.; Chen, Y.; Ghamisi, P. Heterogeneous Transfer Learning for Hyperspectral Image Classification Based on Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3246–3263. [Google Scholar] [CrossRef]
- Ha, S.; Yun, J.-M.; Choi, S. Multi-Modal Convolutional Neural Networks for Activity Recognition. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Hong Kong, 9–12 October 2015. [Google Scholar]
- Yang, R.; Singh, S.K.; Tavakkoli, M. CNN-LSTM deep learning architecture for computer vision-based modal frequency detection. Mech. Syst. Signal Process. 2020, 144, 106885. [Google Scholar] [CrossRef]
- Liu, J.; Yang, X.; Li, L. VibroNet: Recurrent neural networks with multi-target learning for image-based vibration frequency measurement. J. Sound Vib. 2019, 457, 51–66. [Google Scholar] [CrossRef]
- Chen, H.; Hu, N.; Zhe, C.; Zhang, L.; Zhang, Y. A deep convolutional neural network based fusion method of two-direction vibration signal data for health state identification of planetary gearboxes. Measurement 2019, 146, 268–278. [Google Scholar] [CrossRef]
- Wang, H.; Li, S.; Song, L.; Cui, L. A novel convolutional neural network based fault recognition method via image fusion of multi-vibration-signals. Comput. Ind. 2019, 105, 182–190. [Google Scholar] [CrossRef]
- Nweke, H.F.; Teh, Y.W.; Mujtaba, G.; Al-Garadi, M.A. Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Inf. Fusion 2019, 46, 147–170. [Google Scholar] [CrossRef]
- Zhou, T.; Thung, K.-H.; Zhu, X. Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Hum. Brain Mapp. 2019, 40, 1001–1016. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, L.; Yang, K.; Hu, X. Real-Time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-Driving Images. IEEE Robot. Autom. Lett. 2020, 5, 5558–5565. [Google Scholar] [CrossRef]
- Liu, W.; Li, F.; Jing, C. Recognition and location of typical automotive parts based on the RGB-D camera. Complex Intell. Syst. 2020, 1–7. [Google Scholar] [CrossRef]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Shi, X.; Chen, Z.; Wang, H. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Advances in Neural Information Processing Systems; Nips: Montréal, QC, Canada, 2015; Volume 28. [Google Scholar]
Layer | Parameters |
---|---|
3D Conv1 | KW = 3; KH = 3; kL = 3; KC = 3 (1 for depth mode); KN = 30; stride = 1 1 1 |
Max Pooling1 | PS = 1 2 2; stride = 1 2 2 |
3D Conv2 | KW = 3; KH = 3; kL = 3; KC = 30; KN = 60; stride = 1 1 1 |
Max Pooling2 | PS = 1 2 2; stride = 2 2 2 |
3D Conv3 | KW = 3; KH = 3; kL = 3; KC = 60; KN = 80; stride = 1 1 1 |
3D Conv4 | KW = 3; KH = 3; kL = 3; KC = 80; KN = 80; stride = 1 1 1 |
Max Pooling 3 | PS = 2 2 2; strides = 1 1 1 |
ConvLSTM1 | KW = 3; KH = 3; KC = 80; KN = 256; stride = 1 1 |
ConvLSTM2 | KW = 3; KH = 3; KC = 256; KN = 384; stride = 1 1 |
2D Max Pooling | PS = 7 7; stride = 7 7 |
FC | Nodes = 11 |
Datasets | Train | Test | Summary |
---|---|---|---|
Object 1 (Dixie cup) | 310 | 130 | 440 |
Object 2 (Badminton) | 314 | 126 | 440 |
Object 3 (Box) | 299 | 141 | 440 |
Summary | 923 | 397 | 1320 |
Models | Accuracies (%) | |
---|---|---|
Val 1 | Test | |
RGB (one branch) | 99.8 | 89.0 |
DEPTH (one branch) | 83.0 | 82.0 |
Multi-modal fusion | — | 93.0 |
Category | Precision | Recall | F1-Score | |||
---|---|---|---|---|---|---|
Fusion | RGB | Fusion | RGB | Fusion | RGB | |
0 Hz | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
1 Hz | 0.79 | 0.78 | 0.90 | 0.83 | 0.84 | 0.81 |
2 Hz | 0.89 | 0.76 | 0.89 | 0.80 | 0.89 | 0.78 |
3 Hz | 1.00 | 1.00 | 0.83 | 0.85 | 0.91 | 0.92 |
4 Hz | 0.92 | 0.90 | 0.89 | 0.95 | 0.90 | 0.92 |
5 Hz | 0.92 | 0.92 | 0.95 | 0.89 | 0.93 | 0.90 |
6 Hz | 0.97 | 0.91 | 1.00 | 0.97 | 0.98 | 0.94 |
7 Hz | 0.94 | 0.85 | 1.00 | 0.97 | 0.97 | 0.90 |
8 Hz | 0.86 | 0.97 | 0.90 | 0.71 | 0.88 | 0.82 |
9 Hz | 1.00 | 0.85 | 0.92 | 0.87 | 0.96 | 0.86 |
10 Hz | 0.98 | 0.90 | 1.00 | 1.00 | 0.99 | 0.95 |
Average | 0.93 | 0.89 | 0.93 | 0.89 | 0.93 | 0.89 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alimasi, A.; Liu, H.; Lyu, C. Low Frequency Vibration Visual Monitoring System Based on Multi-Modal 3DCNN-ConvLSTM. Sensors 2020, 20, 5872. https://doi.org/10.3390/s20205872
Alimasi A, Liu H, Lyu C. Low Frequency Vibration Visual Monitoring System Based on Multi-Modal 3DCNN-ConvLSTM. Sensors. 2020; 20(20):5872. https://doi.org/10.3390/s20205872
Chicago/Turabian StyleAlimasi, Alimina, Hongchen Liu, and Chengang Lyu. 2020. "Low Frequency Vibration Visual Monitoring System Based on Multi-Modal 3DCNN-ConvLSTM" Sensors 20, no. 20: 5872. https://doi.org/10.3390/s20205872
APA StyleAlimasi, A., Liu, H., & Lyu, C. (2020). Low Frequency Vibration Visual Monitoring System Based on Multi-Modal 3DCNN-ConvLSTM. Sensors, 20(20), 5872. https://doi.org/10.3390/s20205872