Next Article in Journal
Behavioral Biometrics in VR: Changing Sensor Signal Modalities
Previous Article in Journal
Partial Discharge Activity Inductive Sensors and the Application of Magnetic Materials
Previous Article in Special Issue
Deep Ensemble Learning Based on Multi-Form Fusion in Gearbox Fault Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data

1
Key Laboratory of Education Ministry for Modern Design and Rotor-Bearing System, Xi’an Jiaotong University, Xi’an 710049, China
2
State Key Laboratory of Engine and Powertrain System, Weichai Power Co., Ltd., Weifang 261061, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(18), 5898; https://doi.org/10.3390/s25185898 (registering DOI)
Submission received: 9 August 2025 / Revised: 24 August 2025 / Accepted: 18 September 2025 / Published: 20 September 2025
(This article belongs to the Special Issue Applications of Sensors in Condition Monitoring and Fault Diagnosis)

Abstract

Smart manufacturing demands ever-increasing equipment reliability and continuous availability. Traditional fault diagnosis relies on attached sensors and complex wiring to collect vibration signals. This approach suffers from poor environmental adaptability, difficult maintenance, and cumbersome preprocessing. This study pioneers the use of high-temporal-resolution dynamic visual information captured by an event camera to fine-tune a multimodal large model for the first time. Leveraging non-contact acquisition with an event camera, sparse pulse events are converted into event frames through time surface processing. These frames are then reconstructed into a high-temporal-resolution video using spatiotemporal denoising and region of interest definition. The study introduces the multimodal model Qwen2.5-VL-7B and employs two distinct LoRA fine-tuning strategies for bearing fault classification. Strategy A utilizes OpenCV to extract key video frames for lightweight parameter injection. In contrast, Strategy B calls the model’s built-in video processing pipeline to fully leverage rich temporal information and capture dynamic details of the bearing’s operation. Classification experiments were conducted under three operating conditions and four rotational speeds. Strategy A and Strategy B achieved classification accuracies of 0.9247 and 0.9540, respectively, successfully establishing a novel fault diagnosis paradigm that progresses from non-contact sensing to end-to-end intelligent analysis.
Keywords: event camera; fault diagnosis; multimodal large models; dynamic vision event camera; fault diagnosis; multimodal large models; dynamic vision

Share and Cite

MDPI and ACS Style

Lu, Z.; Sun, C.; Li, X. Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data. Sensors 2025, 25, 5898. https://doi.org/10.3390/s25185898

AMA Style

Lu Z, Sun C, Li X. Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data. Sensors. 2025; 25(18):5898. https://doi.org/10.3390/s25185898

Chicago/Turabian Style

Lu, Zihan, Cuiying Sun, and Xiang Li. 2025. "Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data" Sensors 25, no. 18: 5898. https://doi.org/10.3390/s25185898

APA Style

Lu, Z., Sun, C., & Li, X. (2025). Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data. Sensors, 25(18), 5898. https://doi.org/10.3390/s25185898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop