Power Field Hazard Identification Based on Chain-of-Thought and Self-Verification
Abstract
1. Introduction
2. Materials and Methods
2.1. Hazard Recognition in Operation Sites Based on VLLM
2.1.1. Large Vision-Language Model
2.1.2. Prompt-Based Reasoning Strategy
2.1.3. Hazard Judgment Guidance Based on Chain Prompts
2.1.4. Self-Verification
3. Results
3.1. Experimental Platform and Dataset Preparation
3.2. Evaluation Metrics
3.3. Hazard Recognition in Crane Truck Operation Scenes
3.4. Hazard Identification in Escalator Operation
3.5. Comparative Experiments
3.6. Ablation Experiment
4. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Zhu, Y.; Ling, Z.G.; Zhang, Y.Q. Research progress and prospect of machine vision technology. J. Graph. 2020, 41, 871–890. [Google Scholar] [CrossRef]
- Leng, S.; Wang, W.; Ou, J.Y.; Xue, Z.G.; Song, Y.L. On-Site construction safety monitoring based on large vision language models. J. Graph. 2025, 46, 960–968. [Google Scholar] [CrossRef]
- Li, H.; He, S.; Tan, R.; Liang, C.; Huang, Z.; Li, C. UAV distribution network hidden danger target detection based on lightweight model. Autom. Appl. 2025, 66, 73–78. [Google Scholar] [CrossRef]
- Zhao, J.P.; Liu, X.X.; Zhang, X.Z. Intelligent dynamic detection of external scaffold hidden danger based on YOLOv5s. Ind. Saf. Environ. Prot. 2023, 49, 14–19. [Google Scholar]
- Li, Z.K.; Lan, Y.F.; Lin, W.W. Footbridge damage detection using smartphone-recorded responses of micromobility and convolutional neural networks. Autom. Constr. 2024, 166, 105587. [Google Scholar] [CrossRef]
- Jiang, J.J.; Liu, D.W.; Liu, Y.F.; Ren, Y.G.; Zhao, Z.B. Few-shot object detection algorithm based on siamese network. J. Comput. Appl. 2023, 43, 2325–2329. [Google Scholar] [CrossRef]
- Zhang, L.L.; Huang, W.L. Construction of patent knowledge graph based on ChatGPT API and prompt engineering. J. Intell. 2025, 44, 180–187. [Google Scholar] [CrossRef]
- Ba, Z.Z.; Zhang, H.; Xie, Z.G.; Zuo, X.D.; Hou, J.W. Automatic Prompt Engineering Technology for Large Language Models: A Survey. J. Front. Comput. Sci. Technol. 2025, 19, 3131–3152. [Google Scholar] [CrossRef]
- Zhao, S.X.; Li, Y.; Su, S.M. Construction safety monitoring method based on multiscale feature attention network. Sci. Sin. Technol. 2023, 53, 1241–1252. [Google Scholar] [CrossRef]
- Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; Bi, X.; et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. Nature 2025, 645, 633–638. [Google Scholar] [CrossRef]
- Weng, Y.; Zhu, M.; Xia, F.; Li, B.; He, S.; Liu, S.; Sun, B.; Liu, K.; Zhao, J. Large Language Models are Better Reasoners with Self-Verification. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 2550–2575. [Google Scholar] [CrossRef]
- Zhang, H.J.; Zhang, H.; Yan, W.; Zhuo, S.; Jing, Z.G. A precise extraction method of remanufacturing process knowledge based on chain-of-thought prompting in large language models. Manuf. Technol. Mach. Tool 2025, 10, 90–98. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Wang, D.; Lu, F.; Zhang, B. A review of prompt engineering in large language models. Comput. Syst. Appl. 2025, 34, 1–10. [Google Scholar] [CrossRef]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.Y. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Besta, M.; Blach, N.; Kubicek, A.; Gerstenberger, R.; Podstawski, M.; Gianinazzi, L. Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2024; Volume 38, pp. 17682–17690. [Google Scholar]
- Zelikman, E.; Wu, Y.; Mu, J.; Goodman, N. Star: Bootstrapping reasoning with reasoning. Adv. Neural Inf. Process. Syst. 2022, 35, 15476–15488. [Google Scholar]
- Madaan, A.; Tandon, N.; Gupta, P. Self-Refine: Iterative refinement with self-Feedback. Adv. Neural Inf. Process. Syst. 2023, 36, 46534–46594. [Google Scholar]
- Lightman, H.; Kosaraju, V.; Burda, Y.; Edwards, H. Let’s Verify Step by Step. Adv. Neural Inf. Process. Syst. 2023. [Google Scholar] [CrossRef]
- Qi, Z.; Ma, M.; Xu, J.; Zhang, L.L.; Yang, F.; Yang, M. Mutual reasoning makes smaller llms stronger problem-Solvers. arXiv 2024, arXiv:2408.06195. [Google Scholar] [CrossRef]
- Zhang, D.; Zhoubian, S.; Hu, Z.; Yue, Y.; Dong, Y.; Tang, J. Rest-mcts*: Llm self-Training via process reward guided tree search. Adv. Neural Inf. Process. Syst. 2024, 37, 64735–64772. [Google Scholar] [CrossRef]
- Tian, Y.; Peng, B.; Song, L.; Jin, L.; Yu, D. Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing. Adv. Neural Inf. Process. Syst. 2024, 37, 52723–52748. [Google Scholar] [CrossRef]
- Bai, J.; Bai, S.; Yang, S.; Wang, S.; Tan, S.; Wang, P. Qwen-VL: A frontier large visionlanguage model with versatile abilities. arXiv 2023, arXiv:2308.12966. [Google Scholar]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
- Wu, Z.; Chen, X.; Pan, Z.; Liu, X.; Liu, W.; Dai, D. DeepSeek-VL2: Mixture-ofexperts vision-language models for advanced multimodal understanding. arXiv 2024, arXiv:2412.10302. [Google Scholar]
- Wang, Y.; Li, Q.; Dai, Z.; Xu, Y. Current status and trends in large language modeling research. Chin. J. Eng. 2024, 46, 1411–1425. [Google Scholar] [CrossRef]
- Zhang, Y.; Liang, S.; Li, J.; Pan, H. Yolov8s-DDC: A Deep Neural Network for Surface Defect Detection of Bearing Ring. Electronics 2025, 14, 1079. [Google Scholar] [CrossRef]
- Peng, Y.; Li, H.; Wu, P.; Zhang, Y.; Sun, X.; Wu, F. D-fine: Redefine regression task in detrs as fine-grained distribution refinement. arXiv 2024. [Google Scholar] [CrossRef]





| Module Type | Core Thinking |
|---|---|
| Task Description | Detection: “Is there anyone near the crane?”; “Are all four stabilizing supports of the crane fully extended?” |
| Reasoning Requirements | As a power operation safety supervisor, complete the following two safety detections based on the provided crane operation image. For each, first state a clear conclusion, then explain the visual basis by linking to image features (e.g., position, shape). |
| Example Prompt | Presence of personnel: Confirm clear human outlines (head/torso/limbs) in the image. “None” is reasonable if absent; “Present” if present. Full extension of supports: Distinguish supports (telescopic metal structures on the crane’s longer side) from tires (round, non-telescopic). “Fully extended” is reasonable if all 4 are extended; otherwise “Not fully extended”. |
| Output Format | Is there anyone under the crane? Conclusion + Basis; Are all four stabilizing supports of the crane fully extended? Conclusion. |
| Module Type | Core Thinking |
|---|---|
| Task Description | Detection: “Are there any people under the crane”, “Are all four stable supports of the crane fully extended” |
| Verification Standards |
|
| Reliability | Return the reliability level and brief reason. Format: High/Medium/Low |
| Reason | Explain the matching degree between the original result and image features. For example: “The original result ‘Yes, Not fully extended’ matches the image features ‘person on the right side + rear-left support not extended’ with high reliability.” |
| Module Type | Core Thinking |
|---|---|
| Task Description | Detection: “How many people are in the scenario”, “Whether there are behaviors of people climbing down the ladder backwards or leaning out” |
| Reasoning Requirements |
|
| Example Prompt | Return the reliability level and brief reason. Format: High/Medium/Low |
| Output Format | Explain the matching degree between the original judgment and image content, and point out possible deviation points. |
| Experimental Platforms, Equipment, and Datasets | Value/Model/Quantity |
|---|---|
| Drone | DJI MINI2 (SZ DJI Technology Co., Ltd., Shenzhen, China) |
| GPU | RTX4080 super |
| Mobile Acquisition Device | Xiaomi 13 (Xiaomi Corporation, Beijing, China) |
| Experimental Environment | Linux Ubuntu 18.04 |
| Dataset Quantity | Crane operation: 567 images |
| Escalator operation: 1146 images |
| Model | Accuracy | Misjudgment Rate | Recall | Precision | F1-Score |
|---|---|---|---|---|---|
| Janus-Pro (with CoT and SV) | 96.3% | 3.7% | 95.6% | 92.8% | 94.2% |
| Deepseek-vl2 (with CoT and SV) | 95.8% | 4.2% | 94.3% | 91.5% | 92.9% |
| Deepseek-R1 (with CoT and SV) | 95.2% | 4.8% | 90.1% | 93.1% | 91.6% |
| Model | Accuracy | Misjudgment Rate | Recall | Precision | F1-Score |
|---|---|---|---|---|---|
| Janus-Pro (with CoT and SV) | 94.7% | 5.3% | 92.8% | 90.7% | 91.7% |
| Deepseek-vl2 (with CoT and SV) | 93.1% | 6.9% | 91.5% | 89.2% | 90.3% |
| Deepseek-R1 (with CoT and SV) | 92.6% | 7.4% | 90.7% | 88.5% | 91.6% |
| Model Name | Scene Type | Accuracy | Misjudgment Rate | Recall | Precision | F1-Score |
|---|---|---|---|---|---|---|
| Janus-Pro (with CoT and SV) | Escalator Single-Person Scene | 98.5% | 1.5% | 96.8% | 97.4% | 97.1% |
| Escalator Two-Person Scene | 92% | 8.0% | 90.5% | 89.3% | 89.9% | |
| Deepseek-vl2 (with CoT and SV) | Escalator Single-Person Scene | 99.2% | 0.8% | 97.9% | 98.1% | 98.0% |
| Escalator Two-Person Scene | 91.2% | 8.8% | 89.8% | 88.7% | 89.2% | |
| Deepseek-R1 (with CoT and SV) | Escalator Single-Person Scene | 98.5% | 1.5% | 96.5% | 97.2% | 96.8% |
| Escalator Two-Person Scene | 90.8% | 9.2% | 89.3% | 88.1% | 88.7% |
| Model Name | Scene Type | Accuracy | Misjudgment Rate | Recall | Precision | F1-Score |
|---|---|---|---|---|---|---|
| Janus-Pro (with CoT and SV) | Leaning Out Behavior | 94.3% | 5.7% | 92.3% | 89.2% | 90.7% |
| Walking Backwards Behavior | 90.2% | 9.8% | 90.1% | 85.6% | 87.8% | |
| Deepseek-vl2 (with CoT and SV) | Leaning Out Behavior | 92.8% | 7.2% | 91.1% | 87.9% | 89.5% |
| Walking Backwards Behavior | 89.2% | 10.8% | 88.9% | 84.9% | 86.8% | |
| Deepseek-R1 (with CoT and SV) | Leaning Out Behavior | 93.2% | 6.8% | 91.5% | 88.5% | 90.0% |
| Walking Backwards Behavior | 88.2% | 11.8% | 87.7% | 84.2% | 85.9% |
| Model | Hazardous | Precision | Recall | F1-Score | MAP |
|---|---|---|---|---|---|
| Janus-Pro (with CoT and SV) | Personnel intrusion detection | 92.8% | 95.6% | 94.2% | 0.945 |
| YOLOv8s | Personnel intrusion detection | 88.2% | 83.5% | 85.8% | 0.842 |
| D-FINE | Personnel intrusion detection | 82.5% | 85.0% | 83.7% | 0.850 |
| Janus-Pro (with CoT and SV) | Stabilizing support recognition | 90.% | 92.8% | 91.7% | 0.923 |
| YOLOv8s | Stabilizing support recognition | 84.5% | 78.3% | 81.3% | 0.805 |
| D-FINE | Stabilizing support recognition | 82.1% | 76.7% | 79.3% | 0.782 |
| Hazardous | Experimental Configurations | Accuracy | FNR | Recall | Precision | F1-Score |
|---|---|---|---|---|---|---|
| Person Intrusion | No Prompts + No SV | 30.7% | 47.7% | 52.3% | 65.2% | 58.0% |
| With Prompts + No SV | 93.6% | 7.9% | 92.1% | 89.5% | 90.8% | |
| With Prompts + With SV | 94.2% | 6.5% | 93.5% | 90.3% | 91.9% | |
| With CoT + No SV | 94.0% | 7.0% | 93.0% | 90.1% | 91.5% | |
| With CoT + With SV | 96.3% | 4.4% | 95.6% | 92.8% | 94.2% | |
| Stabilizing Supports | No Prompts + No SV | 18.4% | 74.3% | 25.7% | 65.0% | 36.5% |
| With Prompts + No SV | 88.5% | 9.8% | 90.2% | 85.3% | 87.7% | |
| With Prompts + With SV | 89.2% | 8.7% | 91.3% | 86.1% | 88.6% | |
| With CoT + No SV | 91.9% | 8.0% | 92.0% | 88.4% | 90.2% | |
| With CoT + With SV | 94.7% | 7.2% | 92.8% | 90.7% | 91.7% |
| Hazardous | Experimental Configurations | Accuracy | FNR | Recall | Precision | F1-Score |
|---|---|---|---|---|---|---|
| Single-/Double-Person Operation | No Prompts + No SV | 30.5% | 51.1% | 48.9% | 66.1% | 56.3% |
| With Prompts + No SV | 89.0% | 9.2% | 90.8% | 86.5% | 88.6% | |
| With Prompts + With SV | 89.5% | 8.4% | 91.6% | 87.2% | 89.4% | |
| With CoT + No SV | 89.3% | 8.8% | 91.2% | 86.9% | 89.0% | |
| With CoT + With SV | 94.2% | 5.5% | 94.5% | 89.8% | 92.1% | |
| Leaning Out/Reversing Down Escalators | No Prompts + No SV | 27.1% | 57.7% | 42.3% | 65.5% | 51.2% |
| With Prompts + No SV | 88.1% | 10.3% | 89.7% | 85.2% | 87.4% | |
| With Prompts + With SV | 89.1% | 9.2% | 90.8% | 86.0% | 88.3% | |
| With CoT + No SV | 88.4% | 9.9% | 90.1% | 85.6% | 87.8% | |
| With CoT + With SV | 93.2% | 6.7% | 93.3% | 88.9% | 91.0% |
| Prompt Type | Expression | Accuracy |
|---|---|---|
| Original Prompt | Distinguish stabilizing supports (telescopic metal structures on the crane’s longer sides) from tires (round, non-telescopic). | 94.7% |
| Rewritten 1 | Are all 4 telescopic metal supports of the electric crane truck fully extended? | 93.9% |
| Rewritten 2 | Confirm if all 4 telescopic stabilizing supports at the electric crane truck’s bottom are fully deployed. | 94.3% |
| Rewritten 3 | Determine if all 4 telescopic metal stabilizing supports of the electric crane truck in operation are fully extended. | 93.7% |
| Rewritten 4 | Are all 4 telescopic metal stabilizing supports on both sides of the electric crane truck fully deployed? | 93.1% |
| Prompt Interference Category | Interference Content | Self-Verification | Accuracy |
|---|---|---|---|
| Incorrect Definition | Stabilizing supports are round rubber structures (similar to tires)—check if all 4 are fully extended. | √ | 85.2% |
| × | 62.3% | ||
| Scenario Interference | Ignore the crane’s stabilizing supports: first count the trees in the background, then briefly check their status. | √ | 87.5% |
| × | 66.8% | ||
| Ambiguous Expression | Might the crane’s supporting components be fully extended? | √ | 88.7% |
| × | 68.5% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gao, B.; Xia, X.; Zhang, S.; Bai, X.; Li, Y.; Cui, Q.; Kang, W. Power Field Hazard Identification Based on Chain-of-Thought and Self-Verification. Electronics 2026, 15, 556. https://doi.org/10.3390/electronics15030556
Gao B, Xia X, Zhang S, Bai X, Li Y, Cui Q, Kang W. Power Field Hazard Identification Based on Chain-of-Thought and Self-Verification. Electronics. 2026; 15(3):556. https://doi.org/10.3390/electronics15030556
Chicago/Turabian StyleGao, Bo, Xvwei Xia, Shuang Zhang, Xingtao Bai, Yongliang Li, Qiushi Cui, and Wenni Kang. 2026. "Power Field Hazard Identification Based on Chain-of-Thought and Self-Verification" Electronics 15, no. 3: 556. https://doi.org/10.3390/electronics15030556
APA StyleGao, B., Xia, X., Zhang, S., Bai, X., Li, Y., Cui, Q., & Kang, W. (2026). Power Field Hazard Identification Based on Chain-of-Thought and Self-Verification. Electronics, 15(3), 556. https://doi.org/10.3390/electronics15030556

