This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data
by
Zihan Lu
Zihan Lu 1,
Cuiying Sun
Cuiying Sun 2 and
Xiang Li
Xiang Li 1,*
1
Key Laboratory of Education Ministry for Modern Design and Rotor-Bearing System, Xi’an Jiaotong University, Xi’an 710049, China
2
State Key Laboratory of Engine and Powertrain System, Weichai Power Co., Ltd., Weifang 261061, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(18), 5898; https://doi.org/10.3390/s25185898 (registering DOI)
Submission received: 9 August 2025
/
Revised: 24 August 2025
/
Accepted: 18 September 2025
/
Published: 20 September 2025
Abstract
Smart manufacturing demands ever-increasing equipment reliability and continuous availability. Traditional fault diagnosis relies on attached sensors and complex wiring to collect vibration signals. This approach suffers from poor environmental adaptability, difficult maintenance, and cumbersome preprocessing. This study pioneers the use of high-temporal-resolution dynamic visual information captured by an event camera to fine-tune a multimodal large model for the first time. Leveraging non-contact acquisition with an event camera, sparse pulse events are converted into event frames through time surface processing. These frames are then reconstructed into a high-temporal-resolution video using spatiotemporal denoising and region of interest definition. The study introduces the multimodal model Qwen2.5-VL-7B and employs two distinct LoRA fine-tuning strategies for bearing fault classification. Strategy A utilizes OpenCV to extract key video frames for lightweight parameter injection. In contrast, Strategy B calls the model’s built-in video processing pipeline to fully leverage rich temporal information and capture dynamic details of the bearing’s operation. Classification experiments were conducted under three operating conditions and four rotational speeds. Strategy A and Strategy B achieved classification accuracies of 0.9247 and 0.9540, respectively, successfully establishing a novel fault diagnosis paradigm that progresses from non-contact sensing to end-to-end intelligent analysis.
Share and Cite
MDPI and ACS Style
Lu, Z.; Sun, C.; Li, X.
Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data. Sensors 2025, 25, 5898.
https://doi.org/10.3390/s25185898
AMA Style
Lu Z, Sun C, Li X.
Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data. Sensors. 2025; 25(18):5898.
https://doi.org/10.3390/s25185898
Chicago/Turabian Style
Lu, Zihan, Cuiying Sun, and Xiang Li.
2025. "Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data" Sensors 25, no. 18: 5898.
https://doi.org/10.3390/s25185898
APA Style
Lu, Z., Sun, C., & Li, X.
(2025). Multimodal Large Language Model-Enabled Machine Intelligent Fault Diagnosis Method with Non-Contact Dynamic Vision Data. Sensors, 25(18), 5898.
https://doi.org/10.3390/s25185898
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.