Abstract
The petrochemical industry faces significant safety challenges, necessitating stringent protocols and advanced monitoring systems. Traditional methods rely on manual inspections and fixed sensors, often reacting to hazards only after they occur. Multimodal AI, integrating visual, sensor, and textual data, offers a transformative solution for real-time, proactive safety management. This paper evaluates AI models—Gemini 1.5 Pro, OPENAI GPT-4, and Copilot—in detecting workplace hazards, ensuring compliance with Process Safety Management (PSM) and DuPont safety frameworks. The study highlights the models’ potential in improving safety outcomes, reducing human error, and supporting continuous, data-driven risk management in petrochemical plants. This paper is the first of its kind to use the latest multimodal tech to identify the safety hazard; a similar model could be deployed in other manufacturing industries, especially the oil and gas (both upstream and downstream) industry, fertilizer industries, and production facilities.
1. Introduction
The petrochemical industry plays a critical role in modern economies, supplying raw materials for countless products and processes. However, this industry is fraught with significant safety risks due to the handling of hazardous chemicals, high-temperature processes, and large-scale machinery. Accidents in petrochemical plants can lead to catastrophic outcomes, including explosions, fires, environmental contamination, and loss of life. To mitigate these risks, stringent safety protocols and continuous monitoring systems are essential for preventing hazards and ensuring the safety of both personnel and equipment.
In recent years, there has been a growing emphasis on adopting robust safety management frameworks, such as the Process Safety Management (PSM) principles. Developed by organizations like OSHA and further refined by industry leaders such as DuPont, PSM focuses on preventing hazardous incidents by managing critical processes, operational practices, and safety procedures. DuPont’s safety culture, in particular, emphasizes proactive risk management, continuous safety improvements, and comprehensive audits. However, even with these principles in place, traditional safety monitoring methods in petrochemical plants have faced limitations [1].
Historically, safety monitoring has relied on manual inspections, fixed sensors, and software systems designed for task-specific hazard detection. These methods are often reactive, identifying risks only after they have manifested into serious issues. Moreover, human involvement in safety audits introduces subjectivity and inconsistencies, as different personnel may interpret safety violations differently. Given the complexity and scale of modern petrochemical operations, there is a need for a more comprehensive, real-time safety monitoring solution, capable of assessing both human behavior and equipment conditions, ensuring compliance with frameworks like DuPont’s Safety Management System and OSHA’s Process Safety Management guidelines.
Efforts to enhance safety monitoring have included the development of software systems that monitor specific aspects of plant safety, such as equipment temperature, pressure anomalies, or personal protective equipment (PPE) compliance. While these systems addressed certain safety concerns, they were often isolated and lacked the ability to offer a holistic view of plant-wide safety. Traditional machine learning (ML) models were typically trained to detect specific hazards, such as gas leaks or equipment failures, without understanding the broader context of how these incidents interact with overall plant operations [2].
The complexity of monitoring petrochemical operations presents unique challenges. A typical plant consists of numerous interconnected processes, each with its own set of safety risks. Ensuring compliance with both OSHA’s PSM principles and DuPont’s rigorous safety standards requires real-time, continuous evaluation of factors like equipment health, worker behavior, and environmental conditions.
Current monitoring systems often struggle to detect early signs of safety violations, especially when these violations arise from the interaction of multiple factors. For instance, a pressure increase in a pipeline might be manageable in isolation, but if combined with equipment fatigue or human error, it could lead to a serious safety breach. Moreover, traditional systems often fail to account for dynamic changes in plant conditions or evolving safety risks, necessitating more advanced solutions that can process complex data streams and anticipate potential hazards.
Artificial intelligence (AI) offers the potential to address these challenges by providing an integrated, intelligent approach to petrochemical safety monitoring [3]. Unlike traditional systems, AI can process vast amounts of data from diverse sources—visual feeds, sensor data, and textual reports—simultaneously, allowing for a more comprehensive understanding of safety risks. In particular, the recent advent of multimodal AI models, such as Gemini 1.5 Pro, OPENAI GPT-4, and Copilot, has paved the way for real-time proactive safety management in the petrochemical industry.
Multimodal AI combines different data types—such as video, images, text, and sensor readings—into a single, coherent framework. This allows the AI to identify patterns and correlations between data sources that traditional systems might overlook. For example, a multimodal AI system could analyze video footage of workers in conjunction with real-time sensor data to detect improper PPE use or hazardous operating conditions, flagging potential risks before they lead to incidents [4].
These AI systems also align with the PSM principles and DuPont safety monitoring protocols by ensuring that both operational and behavioral risks are managed systematically. By incorporating AI-driven safety monitoring systems, petrochemical plants can better adhere to the DuPont safety philosophy of proactive, preventive risk management. AI’s ability to continuously learn and adapt to new risks means that these systems can evolve in line with plant operations, providing an ever-improving safety infrastructure [5].
This paper is the first of its kind to explore the application of cutting-edge multimodal AI technologies to petrochemical safety monitoring, particularly in alignment with the PSM principles and DuPont’s safety monitoring frameworks. The research evaluates the performance of these AI models in identifying safety concerns, ensuring compliance with OSHA standards, and integrating seamlessly into existing safety protocols. The insights gained from this study could revolutionize the way safety is managed in the petrochemical industry, promoting a shift from reactive to proactive risk management practices.
2. Methodology
In this research, we aimed to develop and test multimodal AI systems for real-time safety monitoring in petrochemical environments. The process involved selecting state-of-the-art AI models, preparing domain-specific data aligned with OSHA standards and common high-risk tasks, defining a prompting strategy, and conducting rigorous testing to evaluate their accuracy in hazard identification. Applying large, generalist foundation models (like Gemini and GPT-4) to such a specialized, high-stakes domain requires specific adaptation techniques to ensure accuracy and relevance. The methodology therefore encompassed three key phases: (1) model selection and configuration, (2) dataset curation and domain-specific model adaptation, and (3) performance evaluation (testing/validation phase).
2.1. Selection of Multimodal AI Systems
To ensure robust results, three advanced multimodal AI systems were selected for comparison:
Gemini AI Developer Studio: Chosen for its flexibility in multimodal processing and rapid development capabilities.
OPENAI GPT-4 Visual Multimodal Technology: Known for its state-of-the-art understanding of both textual and visual inputs, which made it a strong candidate for identifying complex petrochemical hazards.
Copilot: A highly capable tunned GPT multimodal interface.
These models were accessed via their respective APIs or developer platforms. Configuration focused on utilizing their inherent multimodal capabilities without architectural modification by the team. Standardized parameters were used where applicable to ensure comparability during the evaluation phase. Figure 1 shows the architecture used for development.
Figure 1.
System flow for petrochemical safety monitorin.
2.2. Data Acquisition and Preparation
Generalist foundation models possess vast world knowledge but lack the specialized understanding of petrochemical processes, equipment, specific regulatory nuances (OSHA/DuPont), and the visual indicators of hazards unique to this industry. Bridging this gap requires adapting the models using domain-specific data. Two primary strategies exist for such adaptation: prompt engineering (including in-context learning) and fine-tuning.
Training data were gathered to simulate real-world safety scenarios in the petrochemical industry, focusing on high-risk tasks. These tasks, as shown in Table 1, included the following:
Table 1.
Training dataset distribution and hazard type.
- PPE incompliance;
- Confined space entry;
- Hot work;
- Working at heights;
- Handling hazardous chemicals;
- Slip, trip, and fall hazards (general).
Each AI model was trained on both OSHA safety standards and key concerns frequently encountered during these high-risk jobs.
Visual Data: A comprehensive dataset comprising 150 images and 100 video clips was compiled. A total of 80 percent of this data was selected or created to represent a wide spectrum of real-world petrochemical operations and safety scenarios, explicitly including examples of both safe practices and common hazards. Sources included publicly available industrial safety datasets, anonymized data from cooperating industry partners, and potentially staged scenarios illustrating specific risks relevant to the hazard categories in Table 1.
Textual Data: This included safety protocols, checklists, and OSHA standards related to specific tasks, offering additional context for the models to reference.
Sample Hazard Images for Training: Diagrams like the one shown in Figure 2 were used to identify hazards associated with tasks in confined spaces, such as improper use of safety harnesses or proximity to chemical leaks. Visual data were tagged to ensure the model was trained to recognize specific risk factors.
Figure 2.
Sample image from the evaluation dataset depicting a hot work scenario, used as input to the AI models during the testing phase [6].
Model Adaptation Technique: The selected multimodal models were adapted using the curated training dataset. Depending on the specific model’s API capabilities and recommended practices, this adaptation likely involved supervised fine-tuning (adjusting model weights based on labeled examples) or few-shot prompting/in-context learning (providing extensive, structured examples within the prompt during inference to guide the model’s behavior). The goal was to imbue the models with domain-specific knowledge for accurately identifying petrochemical hazards based on learned patterns from the visual and textual training data.
2.3. Performance Evaluation
After training, the models underwent rigorous testing on pre-labeled datasets of both images and videos simulating safety violations in petrochemical sites. The test data were not part of the original training set, ensuring that the models could generalize beyond their training examples as shown in Figure 2 and Figure 3. Response for Figure 2 is given as a sample for different AI model in Table 2.
Figure 3.
Gemini 1.5 Pro Developer Studio interface for safety analysis.
Table 2.
AI models’ hazard detection outputs in response to analyzing the input image shown in Figure 2.
Hazard Identification Test: A 20 percent of total sample dataset containing 150 images and 100 short/long videos depicting real-world petrochemical hazards was used. The models were required to detect the hazards and flag them in real-time.
Input Presentation: Each image or video clip from the evaluation dataset was presented to the adapted AI models.
Standardized Prompting: A consistent prompt structure was used for all models during evaluation to ensure fair comparison. The prompt requested the following:
Output Generation: The AI models generated textual outputs listing identified hazards, categories, and justifications.
Performance Assessment: The AI-generated outputs were systematically compared against the SME-defined ground truth for each test case. The evaluation focused on the following:
- Hazard Detection Accuracy (True Positives, False Positives): Did the model correctly identify hazards present in the ground truth? Did it incorrectly identify hazards that were not present?
- Hazard Miss Rate (False Negatives): Did the model fail to identify hazards that were present according to SMEs?
- Categorization Correctness: Was the identified hazard assigned to the appropriate category?
- Justification Quality: Was the model’s reasoning sound and did it align with safety standards?
- Overall Relevance: Was the output focused and free from irrelevant information?
Scoring and Reporting: The scores presented in Table 3 are semi-quantitative metrics derived from the qualitative assessment. They represent the aggregate performance of each model for each hazard type across the entire evaluation dataset, reflecting the consistency and accuracy of hazard identification relative to the ground truth.
Table 3.
Accuracy and limitation of visual analyzer model depending on hazard type.
Figure 2 provides a specific example of a visual input presented to the AI models during the evaluation phase. Each model processed this image based on its internal representations adapted during the training phase and guided by the standardized evaluation prompt. An example of the user interface used for interacting with one such model, Gemini 1.5 Pro via its Developer Studio, is shown in Figure 3. Table 2 summarizes the resulting textual outputs generated by Gemini 1.5 Pro, GPT-4, and Copilot, detailing the specific hazards each model identified for the scenario depicted in Figure 2. The conceptual flow of how the AI model processes the visual input to arrive at these textual hazard descriptions is illustrated in the flowchart presented in Figure 4 and Figure 5. This overall comparison highlights the differences in detection capabilities, categorization, and descriptive detail among the models for a single test case.
Figure 4.
System flow for petrochemical safety monitoring, outlining the data capture and processing steps involved in multimodal AI hazard detection.
Figure 5.
Conceptual flowchart of AI-powered hazard detection process.
3. Results and Discussion
This section presents the outcomes of the testing phase for multimodal AI systems and discusses their implications for petrochemical safety monitoring.
Performance Evaluation
The three AI models—OPENAI GPT-4 Visual Multimodal, Gemini 1.5 pro, and Copilot-4o—were tested on their ability to identify hazards in petrochemical environments using a predefined dataset of images and videos. The models were evaluated with the results summarized below in Table 3 and Figure 6. Limitation of the model is shown in a block diagram in Figure 7:
Figure 6.
Comparison of AI model performance in hazard detection.
Figure 7.
Overall limitations of AI Visual Analyzer Model.
Analysis
- Gemini 1.5 Pro (95/100): Receives the highest score due to its exceptional accuracy, comprehensive hazard identification, nuanced understanding, and detailed analysis. It consistently demonstrated a strong ability to identify and articulate safety concerns effectively. The minor limitation of potential over-interpretation is often acceptable and even beneficial in a safety context as it errs on the side of caution.
- GPT-4 (ChatGPT) (90/100): Scores very high for accuracy and conciseness. It effectively identifies the primary safety concerns in a clear and actionable manner. While slightly less detailed than Gemini, its ability to provide focused and relevant safety information is highly valuable. It is a very strong performer in hazard identification.
- Copilot (70/100): Scores lower in direct comparison to Gemini and GPT-4 for image-based hazard identification in these specific examples. While it offers a useful and structured approach to thinking about safety through categories, it is less effective at autonomously identifying specific hazards directly from the images. Its strength lies more in providing general safety frameworks and advice, requiring more user interaction to pinpoint image-specific risks. If the goal is to achieve a quick, automated hazard assessment from an image, it is less efficient in its out-of-the-box behavior compared to the other two models based on these examples.
4. Practical Implications
While the AI models show promising hazard detection capabilities, realizing their full potential requires careful integration into existing petrochemical plant operations. This section details the key technical, operational, and organizational considerations for successful implementation.
Technical Considerations:
Integrating AI requires robust APIs and middleware to connect diverse plant systems like legacy software, sensors, and video feeds. Sufficient computing power, potentially cloud-based, and network bandwidth are crucial for real-time AI processing. System architecture should be scalable and reliable, considering cloud, edge, or hybrid deployment. Cybersecurity is paramount to protect data and system integrity. Ongoing maintenance, including model updates, is essential.
Operational Considerations:
AI alerts must integrate smoothly into existing workflows like incident response and inspections, augmenting human efforts. A timely alert system is vital, delivering prioritized hazard notifications to relevant personnel. Training is key to build human–AI collaboration and trust in AI-generated alerts. Addressing alert fatigue through refined AI sensitivity and continuous performance monitoring and calibration are needed in real-world plant conditions.
Organizational Considerations:
Successful AI integration requires change management to address resistance and highlight AI benefits. Leadership commitment is crucial to champion AI and strengthen a data-driven safety culture. Compliance with regulations like OSHA and PSM is essential, ensuring data privacy and system reliability. Resources for AI infrastructure, training, and maintenance need to be allocated, and the ethical aspects addressed to ensure responsible AI deployment.
The application of multimodal AI in petrochemical safety monitoring has the potential to achieve the following:
- Reduce accidents and near-miss incidents through real-time detection.
- Ensure compliance with OSHA standards and enhance the implementation of DuPont’s process safety guidelines.
- Improve operational efficiency by minimizing downtime related to safety incidents, as early detection can lead to quicker mitigation efforts.
Further work is needed to refine the accuracy of the models in complex scenarios and improve their integration into existing safety management systems.
5. Conclusions
The application of multimodal AI in petrochemical safety monitoring marks a significant advancement over traditional methods. The study demonstrated the superior accuracy and real-time hazard detection capabilities of OPENAI GPT-4 Visual Multimodal model and the highest for the Gemini 1.5 pro, while Copilot showed potential but requires improvement in visual hazard recognition.
By adhering to OSHA standards and DuPont Safety Principles, these systems enhance Process Safety Management (PSM), ensuring consistent procedural compliance and risk mitigation. Their ability to detect hazards in real time has the potential to significantly reduce incident rates, improve operational safety, and ensure a more robust safety culture in the petrochemical industry.
As a first-of-its-kind implementation of advanced multimodal AI in this domain, the study highlights the transformative power of AI technologies. Future development should focus on refining AI models’ performance in complex, multi-faceted industrial scenarios including fertilizer and oil and gas industries, further integrating them into comprehensive safety management systems. This will pave the way for safer and more efficient petrochemical operations, reducing risks to both personnel and the environment.
Author Contributions
Conceptualization, U.B., K.K.; methodology, Q.J.; software, N.S.; validation, A.b.R.; formal analysis, U.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research has received no funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Acknowledgments
The authors extend their appreciation to the IT team of Engro Corporation for providing support for the solution development.
Conflicts of Interest
Qamar Jaleel, Umair Aslam, Ahrad bin Riaz, and Najam Saeed were employed by Engro Polymer and Chemical Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
- Knegtering, B.; Pasman, H.J. Safety of the process industries in the 21st century: A changing need of process safety management for a changing industry. J. Loss Prev. Process Ind. 2009, 22, 162–168. [Google Scholar] [CrossRef]
- Saisandhiya, N.R.; Vijay Babu, K. Hazard identification and risk assessment in petrochemical industry. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 778–783. [Google Scholar] [CrossRef]
- Shukla, A.; Karki, H. Application of robotics in offshore oil and gas industry—A review Part II. Robot. Auton. Syst. 2016, 75, 508–524. [Google Scholar] [CrossRef]
- Shahriar, S.; Lund, B.D.; Mannuru, N.R.; Arshad, M.A.; Hayawi, K.; Bevara, R.V.K.; Mannuru, A.; Batool, L. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency. Appl. Sci. 2024, 14, 7782. [Google Scholar] [CrossRef]
- Ma, J. The Application of Artificial Intelligence Technology in the Safety Monitoring System of Oil and Gas Ground. Procedia Comput. Sci. 2023, 228, 486–493. [Google Scholar] [CrossRef]
- Glass, H. Construction continues to struggle badly with output at its ‘lowest level in 13 years’. This Is Money, 9 November 2012. Available online: https://www.thisismoney.co.uk/money/news/article-2230500/Construction-continues-struggle-badly-output-lowest-level-13-years.html (accessed on 18 September 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).






