Development and Integration of Metocean Data Interoperability for Intelligent Operations and Automation Using Machine Learning: A Review

: The current oil industry is moving towards digitalization, which is a good opportunity that will bring value to all its stakeholders. The digitalization of oil and gas discovery, which are produc ‐ tion ‐ based industries, is driven by enabling technologies which include machine learning (ML) and big data analytics. However, the existing Metocean system generates data manually using sensors such as the wave buoy, anemometer


Introduction
According to the World Economic Forum, the estimated net benefits of the oil and gas industry, which is due to digital transformation, will be USD 945 billion over one decade (cumulative value 2016-2025).In addition, digital transformation is not realistic without artificial intelligence (AI) and data science, especially in industries such as oil and gas, finance, and the internet [1].Conversely, due to the current oil and gas crash, digitalization is a good opportunity that will add value to all stakeholders.These opportunities show that there is a need to continue in the digitalization of the oil and gas sector to achieve the expected or more of the stated value by the year 2025.Digitalization of production-based industries is driven by enabling technologies which include machine learning (ML) and big data analytics [2,3].This drives the oil and gas sector to be among the technological advancements and convergence of industry 4.0 in Malaysia [4,5].Thus, the application of AI and machine learning is necessary for any digitalization and automation to succeed in various sectors such as healthcare, finance, industry automation, transportation, and cybersecurity [6,7].
Machine learning (ML) is one of the crucial and key enabling tools that oil and gas industries are now focusing on to implement digital transformation.The application of AI and ML allows an industry to automate and enhance production and human capacity.For example, ML is used to understand and predict behaviour using an ice database for experiment with enhanced prediction for oil production and recovery [8][9][10][11][12][13][14][15][16][17].The Metocean study, conversely, is used by oil and gas, particularly offshore industries, to estimate the environmental conditions for better and successful operations, which include prediction of gas-liquid pattern flow, groundwater anomaly detection, and pipe incident detection [18][19][20][21].However, the existing Metocean system generates data manually using sensors such as wave buoy, anemometer and acoustic doppler current profiler (ADCP).Additionally, improving the existing techniques for Metocean datasets has been provided [22].The input of these data, which appear in ASCII format to the Metocean system is also manual and silos.There is no fundamental method, model, or algorithm that allows the system to reason and automate intelligently.This slows down provisioning, while the monitoring element of Metocean data path is partial.Therefore, a study and model that can integrate the Metocean data as well as provide better decision making and reduce manual tasks is needed.
The main aim of this paper is to provide an in-depth review of the method for developing and integrating oil and gas data, using a particular case study on the Metocean data system using a machine learning algorithm.

Integration of Metocean Forecast Data and Automation
Metocean data interoperability development with integration using automation measures parameters such as surge height and wave height.This can also be addressed by using ML model techniques for efficient results.Qiao and Myers [23] proposed surrogate time-independent modelling to evaluate Metocean conditions during hurricanes, which include peak wave period, peak wave direction, and storm surge using a Metocean database.The model was developed using a recurrent neural network (RNN), gated recurrent unit (GTU), and multilayer perception (MLP).Orlandi et al. [24] proposed a prototype system that is capable of integrating Metocean model forecast and performance of ship data for modelling.In addition, the prototype allows for visualization of Metocean data after uploading from the system for evaluating the condition on each route.
Several research studies have explored the needs of ML in the oil and gas industries.In this study, we investigate in depth the integration of Metocean data interoperability for intelligent operations and automation using an ML-based approach.


A comparative analysis of the current uses of ML algorithms in the oil and gas industries is presented.


The transformation and integration of existing Metocean architecture for the use of oil and gas data operators through automation is meticulously reviewed.


A new model to be integrated with the existing Metocean data system using a machine-learning algorithm to monitor and interoperate with maximum performance is proposed.

Meta-Survey
This section presents several works related to the automation and digitalization using machine learning in the area of control systems and decision making.Many workflows in business organizations prove to be difficult and monotonous due to the increase in access to organizational data.This makes the cost of production high, which forces indus-tries, business, and organizations to automate their functions to reduce the cost of production and boost system efficiency [25][26][27].An organization must choose to digitalize and automate its tools; otherwise, it will be obsolete.Therefore, to realize the vision of digitalization in oil and gas or "digital oilfield", technology-centric solutions such as big data, cloud computing, the internet of things (IoT), and many more need to be considered in automation and decision-making processes [28].What manifests the digitalization, automation, automated analysis, and data integration in oil and gas is machine learning and AI in general [29][30][31].Therefore, adding value to cost-saving, automated operations, improved monitoring, mitigation of environmental hazards and improved decision making will be realized.
Pashali et al. [32] conducted a comprehensive survey of Metocean and ice for the development of the Russian Arctic continental shelf.The survey determines Metocean data for exploratory drilling processes with design data for offshore oil and gas facilities through round operations in freezing seas.A similar study by Buzin et al. [33] gives a comprehensive review for the period between 2012 and 2021 based on Metocean shelf development projects and their approaches and results.The study also provided an overview of the Arctic and Antarctic Research Institute (AARI) task scope chronology survey.Wang et al. [34] conducted a comprehensive review of current field monitoring development for offshore structure monitoring, which includes Metocean sensing and structural motions.The study also provides state-of-the-art development in offshore structure monitoring.
Although, the application of ML techniques has been utilized in various domains including manufacturing sectors.For example, Nasiri and Khosravani [35] investigated the additive manufacturing (AM) parameters and prediction of mechanical behaviour of 3D components using ML technique.Consequently, the authors focused on prediction of ML applications for mechanical behaviour.However, current challenges have been provided which include lacking huge datasets from 3D printing that can lead to low accuracy results.More recently, Verma and Verma [36], surveyed ML applications in the healthcare sector, where they play a vital role in several areas such as healthcare data analytics and medical data protection.However, medical records and disease forecasts have been analysed using ML applications.The authors also provided a research gap for efficient use of ML algorithms in the healthcare sector with opportunities and challenges.
Moreover, a systematic review has been presented which shows the potential of ML applications in oil and gas industries [37].However, several authors have attempted implementing ML classifiers in various domains for prediction and enhancement such as acid-fracturing, CO2 sequestration, and rock brittleness [38][39][40].
Comparison details of other related studies in the same domain is presented in Table 1.The studies provide an in-depth analysis of Metocean, oil and gas, and ML areas.

Review Methodology
In this section, the method of review employed is described.Kitchenham and Charters [44] standard guidelines were applied.The literature review was performed using a relevant database based on the search strategy developed to identify relevant studies.The review flowchart for the data collection process is depicted in Figure 1.

Dimensions of Machine Learning Algorithms
In this section, the dimensions of ML are described.Machine learning (ML) is the study of mathematical models and algorithms that allow software applications to learn and improve automatically through experience.ML is a branch of artificial intelligence, where it deals with prediction or decision.ML algorithms use training data or mathematical model sample data without developing a new task.Additionally, machine learning focuses on data-driven modelling, which offers great value to oil and gas companies by digging out complex data [45], where structured or unstructured data may result in big data issues in oil and gas.Non-linearity and uncertainties make the oil and gas raw data actionable knowledge [45].Moreover, ML algorithms have been used to develop a novel surrogate model using the random forest (RF) technique for estimating the simulation of the Simulating Waves Nearshore (SWAN) model [46].
The mathematical equation for ML is provided based on selected review studies.For example, k-means clustering approach is used for efficient dimensional reduction by minimizing the square error over all K clusters.With u k as the centroid of the cluster, ck is the main point of the cluster, and |Sk| refers to the number of samples in the cluster ck [47].
In addition, the silhouette index varies from −1 to 1; the value close to one refers to the appropriate data within its clusters.The silhouette () value can be determined using Equation (2).
Another classifier (ANN) is among the most widely used ML algorithms due to its enhanced robustness.It also contains one input layer, one or more hidden layers, and one output layer.The signal process could be donated, as in Equation (3).

𝑌 𝐺 𝑤 𝑋 𝑏
To address the ML issue of negative samples from unlabelled datasets and classifiers, the training data is fit too tightly, which could lead to false results.The authors in [48] proposed MSE as the loss function L to overcome various penalty factors for positive and negative samples.
ML can be generalized through training and the demonstration of models for data classification.In addition, normalization and standardization play an important role in ML modelling by rescaling values and data.Moreover, feature engineering improves ML prediction or unseen data after transforming raw data into features.However, to archive efficient integrity results, data quality assurance has to be considered.Figure 3 clearly provides an illustration of ML categorization based on a hierarchy flowchart.

Supervised Learning
Supervised learning algorithms are trained based on input data that have been labelled for output.They analyse the training data which can be used for new sample data [49].The learning process can be categorized as classification and regression accordingly.The ML classification categorizes a given dataset into classes which include structured and unstructured data, where those classes can provide predication and detection after modelling.In Chen et al. [50], a new ML classification model based on a support vector machine (SVM) was proposed to calculate gas saturation in Shale reservoirs of the Chang 7 formation by comparing the result with laboratory-measured values.In addition, a hybrid ML model has been proposed which includes linear regression (LR) to predict and rate the efficiency of water flooding during oil production [51].

Unsupervised Learning
Unsupervised learning algorithms learn from the availability of label class, where all the input samples are unlabelled.The learning process includes clustering and dimension reduction.For example, Liu et al. [52] proposed an ocean reconstruction front based on the K-means algorithm to archive hierarchical clustering sound speed profile (SSP).The study also provides a method to verify the feasibility method from the perspective of transmission loss (TL) calculations.Adding to this, a dimension reduction model has been proposed for uncertainty quantification and reservoir calibration with less computation time [47].

Semi-Supervised Learning
Semi-supervised learning algorithms involve few amounts of labelled classes with more unlabelled classes during training.It also classifies the problem of classification using both labelled and unlabelled data.Semi-supervised learning clustering can be identified as constraint clustering where there is an issue with creating a clustering based on labelled and unlabelled data [53].Furthermore, semi-supervised learning has been proposed for long facies identification in carbonate reservoirs based on multiclass positive and unlabelled PU-learning ML [48].Conclusively, a new approach has been proposed for oil prediction using a new graph, Laplacian, which is based on a semi-supervised clustering technique [54].More recent evidence, as reported Salem et al. [55], indicates that semi-supervised learning plays a vital role in addressing diverse problems in oil and gas industry digitalization.The authors also identified the potential of semi-supervised learning in predicting well integrity failure after learning.

Reinforcement Learning
Reinforcement learning algorithms involve a learning process with no label class but which consists of reward value.However, it can be somewhere between supervised and unsupervised learning.Dong et al. [56] proposed a deep reinforcement learning (DRL) for automatic curve matching to achieve well-testing interpretation through evaluation reservation parameters based on a double deep Q-network (DDQN).Gas turbine maintenance using an optimal part flow management approach has been proposed based on reinforcement learning to overcome a sequential problem [57].A recent study has indicated the capabilities of reinforcement learning in the oil and gas industries.Nasir et al. [58] presented a deep reinforcement learning which provides optimization and development plans for reservoir models with less computation cost.

Proposed Framework
In this section, the proposed machine learning model will be developed using the workflow described in Figure 4: proposed research methodology.The work will start with the problem definition stage, which is studying the existing Metocean data system as well as the business needs/objectives.Each attribute of the data will be identified, and the Metocean data for the solution is prepared accordingly.The next stage will be gathering and preparation of data.This includes analysing and preparing for structuring and distributing of the data.The data will be organized and well prepared to be ready for the next level.Data conversion and aggregation will be conducted in this phase.The first training data set will be utilized in this phase.By using a Gaussian-means algorithm, the Metocean data will be classified into different distinct areas, while other data will then be chosen randomly, using Equation ( 5).Fundamentally, kernel density estimation (KDE) will be represented as K, with the σ as bandwidth.Thus, x as the weighing of the observation from a particular point can be expressed as: In the architectural integration and model development stage, algorithm development and evaluation will be conducted.This consists of exploring and selecting the algorithm and reporting the interpretation of the results.Testing of the dataset occur, after which the training dataset will be run again, plus subsequent tests.Python programming language and Hitachi Infrastructure Analytics Advisor will be used for application development.To fully deploy a scalable system, cloud infrastructure is suggested to be developed in this work, as the data will be collected from sensors in real time and non-real time.Therefore, Software as a Service (SaaS) will act as a leveraging platform, such that we will exploit and launch a better Metocean data system with AI Solutions.The proof of concept includes the Metocean parameters (input and output) and component integration, while the decision tree is the data-driven technique.Thus, a new predictive model will then be developed.
Conversely, training the data model in this work will rely on an unsupervised machine learning algorithm.The expectation-maximization (EM) algorithm will provide the fundamental concepts in graphical models and inference algorithms on graphs.It will simplify the iteration of parameters by optimizing the lower bound function.Estimating the ML parameters in this model can be one of the anticipated contributions of this research.Therefore, for a model with latent/hidden variables x for the data points and y as the observed variables, the lower bound can be considered as follows: log  , |    ≝  , where q(x) is an arbitrary density function, while the lower bound F is the function of both the density q(x) and the model parameters.As most of the Metocean data are the time series data generated from offshore sensors, these data are unstructured.This research will model the time series data using the state-space model (SSM).Equation ( 9 During the stage of experimentation, a test harness will be employed, where test model, test data, and evaluation of the new ML algorithm will begin.The performance will also be determined in this phase.The incorporation of the new model with the existing Metocean system will be determined.The memory size and parametrization of the system will also be evaluated after confirming the robustness of the K-mean algorithm.Furthermore, clustering, neural networks, and anomaly detection are all common algorithms used in unsupervised machine learning.However, this project focuses on clustering (using K-Mean algorithm).This is mainly because of the quality nature of Metocean data.
Operationalization stage: this phase includes the data interpretation, reporting services, and generating results such as quality control (QC), environmental statistical analysis, spectral analysis, etc., by Metocean users.Confirmation of successful interoperability and data integration of the model will be determined in this operationalization stage.Finally, a new integrated Metocean data system will be deployed, tested, and monitored.

The Need to Digitalize, Automate, Integrate Oil and Gas Data
In this section, the need for oil and gas data digitalization is described.Digitalization acts as an enabler that brings value to all oil and gas stakeholders, especially at the time of downtime [59].Digitalization can be referred to as a way of restructuring digital infrastructure and communications by organizations or industries [60].Therefore, digitalization is a good business opportunity for the oil and gas industry, especially during the crush of oil prices.For this reason, it was estimated that the net benefits of the oil and gas industry due to digital transformation will be USD 945 billion over the decade to 2025 (cumulative value 2016-2025) [59].The oil and gas sector has been one of the most significant and competitive economic sectors around the globe [61,62].To have successful operations and exploration success, the industry needs to sustain its production.Nonetheless, one of the positive implications of an organization's performance is that the organization sustains its competitive advantage [63,64].It is posited that "success in the future of oil and gas will require the continued adaptation of the complex business model to unforeseen challenges".On this account, the competitiveness and the demand for digitalization (especially in the AI and big data domains) have made the implementation necessary [65].

Digitalization of Oil and Gas Using Machine Learning
Reviews on digitalization were conducted using ML on recent developments in oil and gas.One of the findings revealed that oil and gas should leverage the new technological developments with focus, agility, and collaborative teams of big data [41].Another study was conducted on evaluating the status of the data-driven approach in the oil and gas industry, where historical data is used [45].It was found that the data-driven approach provides huge advantages in the industry over the conventional approach under certain conditions.Unfortunately, it was also discovered that the approach for many industry professionals remains fuzzy.Thus, it is clear that there is a need for further study of ML in the oil and gas domain.Figure 5 shows a real-time dashboard illustration for web application necessaries, functionality, analytics library, and web-based programming.Keerqinhu et al. [67] attempted to solve issues of reciprocating compressors in the petroleum industry.The findings suggested a system for fault diagnosis for reciprocating compressors using ML techniques based on a learned dictionary.The system evaluated 5year operations collected from offshore oil corporations in the cloud environment.Significantly, their proposed system showed better results and indicated 80% accuracy, which can effectively diagnose potential faults in compressors.Similarly, research was conducted by applying a machine learning approach for big data, sizing of metal-loss defects, and failure risk analysis in oil and gas pipelines [68][69][70].They adopted Levenberg-Marquardt's back-propagation learning algorithm [71].They found a promising result with an estimated accuracy of 86% (±10% error tolerance) and 89% (±15% error tolerance).A recent work conducted by Yang et al. attempts to provide an elastically scalable cloudbased system to solve big data issues for the upstream oil and gas industry with high performance.Yang et al. [66] used machine learning and processed complex datasets in real time or near real time effectively.Figure 6 presents the thermal analysis based on distributed temperature sensing (DTS) measurements on total production.The proposed integrated system contains several services, including legacy oil and gas data.However, this work has not been integrated with meteorological or oceanographic data.There is no review and approval process in the system, which reduces the quality of the proposed monitoring.Hence, a system that will incorporate digitalization, monitoring, and data integration of meteorological and oceanographic or Metocean data is necessary.

Metocean Data System
This section presents an overview of the Metocean data system.Metocean conditions are directly related to the offshore project, project operation, and maintenance [72].

Metocean Data System in Malaysia
Oil and gas companies use Metocean data as the main information source for determining the time and other weather factors.The users that benefit from these data comprise the participants of oil and gas companies, the partners, the contributors, as well as the research institutes, and are of extreme value [73][74][75].The Metocean contributors are BP, Chevron, PETRONAS, Statoil, Total and research institutes.In Malaysia, demographically, the main Metocean contributor in Malaysia is PETRONAS, which is the Malaysian-owned oil and gas company that is among the Metocean contributors.It has enormous resources with diverse data for scientific research but they have not implemented AI or machine learning technologies in its Metocean data system.The information supplied by Metocean to users is primarily for oil spill response, satellite locations, environmental platforms and other types of drifting buoys [76].The Metocean database has high quality of oil and gas datasets which are structured on uniform formats of ASCII, NetCDF and indexed upon the ISO 19115 metadata standard [73].Metocean has become an industry that handles large amounts of data and metadata which yield immense benefits to oil and gas stakeholders.The results of the Metocean data are in the form of text or graph formats.Undeniably, the Metocean data are the resources for petroleum industries as well as human life activities, which are therefore necessary to be monitored, automated, and interoperable for the benefit of its users.
As PETRONAS remains the major oil and gas company in Malaysia, it officiously intends to diversify the energy sector beyond 2020 with a high competitive advantage [77,78].More recently, as reported in the PETRONAS activity outlook 2019-2021, "PETRONAS is actively seeking ways to deploy technology in terms of digital, data analytics, automation, etc." [79].Thus, PETRONAS must make the Metocean data system fully autonomous and digitalized to achieve its goal effectively.In addition, some ML initiatives have been seen in the oil and gas sector.While the sector wants to transform into a fully digitally data-driven industry, it is not enough to solve the continuous problem of integrating all oil and gas data over the globe using AI or ML.This indicates the presence of limited integration and few empirical studies of ML, digitalization, and data integration in the oil and gas sector.Hence, the need to fill this gap is apparent.The Metocean data system flow information for assessment process based on risk inspection technique is illustrated in Figure 7.A growing body of literature has investigated the Metocean data system in Malaysia.Lai et al. [81] investigated the effectiveness of genetic programming (GA) and support vector machine (SVM) learning models in predicting monthly sea-level variations by comparing model accuracy performance.Adding to this, the model has been validated using datasets from Tioman Island, Kerteh, and Tanjung sedili, Malaysia.Moreover, other methods have been applied in Metocean data systems by providing semantic web technologies for the oil and gas industries, which include an architecture for data integration.The application database utilized includes RDF query (D2RQ) for setting performance.The study data were obtained from Malaysia's oil and gas industries [82].More recent studies have investigated Metocean data systems around the globe.Qiao et al. [83] implemented a numerical model using Mike 21 to estimate Metocean conditions by evaluating hurricanes and hindcasts on the United States Atlantic coast.

Machine Learning for Metocean Data Integration
In this section, the impact of ML in Metocean data integration is elucidated.ML algorithms play a vital role in Metocean data integration by improving accuracy with computational performance for forecasted wave conditions, wind parks, offshore modelling, and many more.Chen et al. [46] proposed the ML technique for Metocean data integration to derive spatial wave data.However, ML algorithms can be applied for Metocean data model comparison to predict offshore platform integrity, model structure integrity, and geohazard data [84].In Wyatt [85], the author investigated the Metocean data parameters towards HF radar wind speed and measurement using various ML techniques.However, SVM regression provides efficient results in wind speed estimations.The proposed method has been validated using different radar systems at different locations [85].In addition, ML algorithms have been used in the prediction of Metocean data in Korea Strait, where feedforward neural network (FNN) and long-short term memory (LSTM) models have been used for the prediction of wave height.The proposed model has been validated using Metocean data from 2012-2020 from the Korea Institute of Ocean Science and Technology [86].

ML for Metocean Forecasters Data
Metocean forecasting data are models that provide in-depth reliable weather information in an ocean.In addition, they provide analysis, planning, and securing of offshore operations.ML-based techniques improve the performance of Metocean forecasting data.For example, Martinez-Perurena et al. [87] designed a forecast of Metocean data for a marine renewable energy system using hybrid ML techniques which include support vector regression and random forest.Moreover, a forecasting model has been proposed for Metocean prediction using linear regression and H2O auto-ML techniques for knowledge enrichment [88,89].

ML for Metocean Spatial Wave Data
Metocean spatial wave data provide an accurate spatial analysis using computational models to predict wave situations across the ocean.This prediction can be improved using ML-based techniques.Chen et al. [46] proposed ML techniques to drive spatial wave data using a novel surrogate SWAN numerical model based on random forest to replicate the spatial nearshore wave data.In addition, the model demonstrated ML abilities for correlating the spatial ocean waves through optimal spatial gridding and provided high-resolution distribution in real time [46].

ML for Metocean Data Linked
The Metocean data link provides structured interlinks with other data to become useful in semantic queries such as providing information to petroleum and research industries.However, Metocean data or oceanographic datasets are scattered when published, and processing around the globe due to their huge amount leads to linked data and semantic web capabilities [90].ML capabilities can be used to improve data-linked processing.Colin et al. [91] proposed a semantic segmentation containing ten Metoceanic processing large-quality image-level ground truths using deep-learning techniques.

ML for Metocean Conditions during Hurricanes
Metocean conditions in hurricanes have historically damaged the environment, which can occur by multiple parameters such as significant surge height and wave height that can vary in time and cross-correlation [23].However, Metocean systems are remote complexes which lead to hazards such as hurricanes that affect platforms [84].For example, in 2014, significant damage to the Taylor energy oil platform was caused by Hurricane Ivan after being installed for 20 years [92].ML techniques can be applied for hurricane predictions.Asthana et al. [93] proposed an ML model for the prediction of Atlantic hurricane activity based on a convolutional neural network (CNN) approach.

Discussion
In this section, we discuss the findings of this research study.The critical analysis and evaluation for the integration of Metocean data intelligent operations using ML models based on empirical studies have been revealed.The oil and gas industry's business objective is to digitalize its production and development, which cannot be achieved without the implementation of ML techniques and big data analytics.In addition, the transformation and integration of existing Metocean architecture for oil and gas data operation can be enhanced with the advancement of ML techniques.The Metocean data development and integration significantly improved after applying ML techniques such as better prediction, providing efficient results, saving time, reducing system complexity, and reducing operational overheads.Moreover, the ML methods vary in different categories with its subsection known as DL in Metocean data analytics and integration.
As shown in Table 2, we indicated AI models that could fit in the Metocean data system.AI models have been the environment for creating, training, and implementing models based on available datasets for better decision making and predictions, for example, classification, regression, and clustering, where authors compare several ML models which include ANN, XGBoost, SVM, and statistical regression for ship power modelling with measuring Metocean condition and hull maintenance [94,95].Convolutional neural network (CCN) models are part of deep neural networks which play vital roles in image classification problems through their multiple convolution layers with kernels to detect complex features.Adding to this, a novel CCN model has been applied for failure identification of mooring line and turret-moored FPSO systems [96].Random forests (RF) are ML models based on trees assembled for prediction with high effective performance to solve regression and classification problems.For example, RF models have been developed to predict spatial distribution and frequency groundings of a ship [97].As the modern technique provides an avenue for data-driven modelling using various methods, genetic programming (GP) techniques have evolved programs that can develop and compute solutions to human problems which cannot be solved directly.For example, the GP technique has been used for partial differential equations (PDE), discovery of Metocean processing, and acoustics function discovery [98].The Bayesian framework for ML has been applied through Gaussian processing (GP) for efficient integrity management (IM) of steel lazy wave risers (SLWRs).However, GP is genetically supervised learning for probabilistic classification problems and regression [99].Lastly, the state-of-art ML presented is efficient and effective in performance, which includes predictions, classifications, and decisions.All models discussed can fit the Metocean data system for better forecasting, integration, and intelligent operations.[99] 2020 Gaussian processes (GP) Classification, regression Park et al. [86] 2021 long-short term memory (LSTM) Classification Tadjer et al. [47] 2021 Gaussian processes (GP) Classification, regression As shown in Table 3, state-of-art studies have been listed with insight on Metocean data integration in the oil and gas domain using ML techniques.Notably, ANN, SVM, RF, and XGB are the most used models for Metocean data analytics and integration due to their efficient performance.However, ML performance towards the prediction of Metocean data interoperability for intelligent operations can be enhanced through hyperparameter tuning and feature selection techniques for better performance.

Limitation of the Study
Our review study comprehensively summarizes the deep insight on Metocean data interoperability for intelligent operations using ML techniques.Consequently, certain ML techniques have not been included, especially those under DL methods.After a critical analysis, it was revealed that some proposed models are weak due to the ML method and data analysis procedure used, which can lead to how the model performed.
In addition, each proposed technique has certain issues which have not yet been resolved.For example, some studies are having problems with datasets, feature reports, predictive features, and high-quality data.Our study has the potential for further investigation to provide a comparison for the state-of-art studies model performance and efficiency.Adding to this, several aspects of Metocean in oil and gas development are excluded.Lastly, the meta-survey studies have considered published articles between 2009 and 2022.
The current limitation can be overcome by implementing additional AI models based on DL algorithms such as CCN, RNN, GAN, and DBN to investigate Metocean data conditions.The proposed techniques/models can also be improved through hyperparameter turning or feature selection techniques.Adding to this, data quality assurance has to be considered in selecting datasets.However, relevant related studies before 2019 can be referred to, and other database sources can be explored.

Recommendation for Future Study
Owing to the digital transformation of the oil and gas sector, which includes Metocean data interoperability for intelligent operations using AI technique, further investigation of existing ML approaches is recommended.Additionally, AI algorithms are moving towards statistical reasoning by introducing next-generation approaches to automate the Metocean, including oil and gas development.Our result revealed that different AI techniques have been proposed and implemented based on ML models which include hybrid methods of different algorithms.
However, there is a need to improve the existing intelligent framework by introducing different combinations of AI models for better performance.Therefore, further work needs to concentrate on selecting suitable preferred ML models, high-quality data, realworld datasets, and improving result accuracy.Lastly, problems of imbalanced data need to be considered due to the huge amount of Metocean data.

Conclusions
In this paper, a comprehensive review of development for Metocean data integration, interoperability based on intelligent operations, and automation using ML techniques is presented.The existing Metocean system generates data using sensors such as wave buoy, anemometer and acoustic doppler current profiler (ADCP) manually.Additionally, these data which appear in ASCII format to the Metocean system are also manual and silos.There is no fundamental method, model, or algorithm that allows the system to reason and automate intelligently.This slows down provisioning while the monitoring element of the Metocean data path is partial.
However, with the current oil and gas crash, digitalization is a good opportunity that will bring value to all its stakeholders, which shows that there is a need to continue in digitalization of the oil and gas sector to achieve the expected, or more of the, stated value by the year 2025.Moreover, the digitalization of the oil and gas production-based industries is driven by enabling technologies which include ML and big data analytics.ML is one of the crucial and key enabling tools that oil and gas industries are now focusing on for digital transformation implementation which allows the industry to automate, enhance production, and have less human capacity.
We summarized several research studies exploring the needs of ML in oil and gas industries by investigating in depth the integration of Metocean data interoperability for intelligent operations and automation using an ML-based approach.In addition, a comparative analysis of the current uses of ML algorithms in the oil and gas industries was provided, and we reviewed the transformation and integration of existing Metocean architecture for the use of oil and gas data operators through automation.Lastly, a new model and its integration with the existing Metocean data system using an ML algorithm to monitor and interoperate with maximum performance was proposed.

Figure 1 .
Figure 1.Flowchart for the data collection process.

Figure 2
Figure 2 demonstrates an overview structure of the review study which is categorized based on sections.The categorization approach provides a better understanding of the study.

Figure 2 .
Figure 2. Structure of the review study.

Figure 5 .
Figure 5. Data communication and flow between field device, cloud server, and customer [66].

Table 1 .
Comparison of other related studies in the same domain: (√: Yes, ✕: No).

Table 2 .
List of some selected AI models that could fit the Metocean data system.

Table 3 .
List of ML state-of-art studies with their pros and cons.