Risk Assessment in Energy Infrastructure Installations by Horizontal Directional Drilling Using Machine Learning

: Nowadays we can observe a growing demand for installations of new gas pipelines in Europe. A large number of them are installed using trenchless Horizontal Directional Drilling (HDD) technology. The aim of this work was to develop and compare new machine learning models dedicated for risk assessment in HDD projects. The data from 133 HDD projects from eight countries of the world were gathered, proﬁled, and preprocessed. Three machine learning models, logistic regression, random forests, and Artiﬁcial Neural Network (ANN), were developed to predict the overall HDD project outcome (failure free installation or installation likely to fail), and the occurrence of identiﬁed unwanted events. The best performance in terms of recall and accuracy was achieved for the developed ANN model, which proved to be efﬁcient, fast and robust in predicting risks in HDD projects. Machine learning applications in the proposed models enabled eliminating the involvement of a group of experts in the risk assessment process and therefore signiﬁcantly lower the costs associated with the risk assessment process. Future research may be oriented towards developing a comprehensive risk management system, which will enable dynamic risk taking into account various combinations of risk mitigation actions. terms of recall for 7 events prediction, precision for 1 event prediction, accuracy for 6 events prediction, f 1 for 476 7 events prediction and score for events The proposed ANN model turned out to produce the best results (or no better result was obtained by other methods) in terms of recall for the prediction of 21 478 events ( e 1 , e 2 , e 3 , e , e , e , , , , , , , e , e , e , e , e , e 21 , e and a ﬁnal outcome of the 479 HDD project depicted as OK). The developed ANN model outperformed the rest of the models (or no better 480 result was obtained with other methods) in terms of accuracy for the prediction of 19 events ( e 1 , e 2 , e 3 , , , e , e , e , e , e , e , e , e , e , e 17 , e 18 , e 20 , e 21 , and a ﬁnal outcome of the HDD project depicted as OK). It 482 allowed to predict all analysed events with accuracy greater than or equal to 0.926. It also let to predict 72.72% 483 of the assessed events with recall greater than or equal to 0.875.


Introduction
Pro-ecological trends in the European Union's energy policy are reflected in the increasing popularity of sustainable development idea and the related increase in the demand for energy from natural gas resulted in a growth in the demand for the construction of new gas pipelines in Europe. According to the data of Eurostat in 2019, natural gas inland consumption in the European Union increased by 4.2% compared with 2018, reaching a level not seen since 2010 [1]. According to Statistics Poland data the consumption of natural gas in Poland (without taking into account the consumption for technological purposes of the gas sector) in 2017 reached 628.5 PJ [2], and in 2018 it increased to 660.3 PJ [3]. Due to the intensive expansion of the gas pipelines network, the length of the active gas transmission network in Poland increased from 21,139.631 km in 2015 to 21,264.629 km in 2019 [4]. A large part of new gas pipelines is installed using trenchless construction technique called Horizontal Directional Drilling (HDD) technology. Its growing popularity is connected not only with lower installation cost and the possibility of steering around natural or man-made obstacles, but also with lower negative environmental impact. In the case of trenchless technologies, greenhouses gases emissions are lower due to: shorter project durations, less equipment requirements and smaller footprint of excavation area used, compared with open-cut pipeline installation methods [5]. However, this technology as each trenchless or open-cut construction technique is associated with certain risk, which

Risk in HDD Technology
In the case of HDD technology, significant risk level and uncertainty result from the variability of geotechnical conditions, often limited access to specialized tools and machines being deployed underground, dynamic natural environment, technical problems, human factor and changing economic environment. It must be stressed that HDD project failure could lead not only to significant economic loss, but also increased environmental or social impact of construction, as well as accidents on the building site or fatalities. Contractors, designers and owners engaged in the HDD projects stress the need to carry out risk assessment before starting the investment realization, as thoroughly estimated risk level is an entry point for carrying out project feasibility study and cost estimation. All in all, carefully conducted risk assessment allows avoiding several significant economic and legal HDD failure consequences, such as for example damaging adjacent existing underground utilities or ground infrastructure, damaging costly HDD down-hole equipment or the product pipe. It is important to stress that in the case of complex and innovative construction projects properly carried out risk assessment process in projects preparation stage enhances desired project course [10,11].
It is crucial to pay attention to a proper design of the HDD trajectory [12], as well as its optimization [13], taking into account adjacent existing urban infrastructure and technical feasibility of the design. It is particularly dangerous when the designed HDD trajectory collides with the existing elements of the underground or terrestrial urban infrastructure transporting oil, gas and power cables. The risk of striking a working energy infrastructure is increased especially in congested cities and in cases where the localization of the existing elements of urban infrastructure was carelessly made or not done at all. In December 2019, the existing gas pipeline was damaged during the trenchless drilling in Szczyrk (Poland). The gas evaporated for 20 min and as a result, a fire occurred and a neighbouring tenement house collapsed, killing eight people. That is why quality design of the HDD trajectory in relation to the on-site geological conditions, and adjacent urban infrastructure, as well as project specificity are important. Such a situation could have been foreseen and, as a result, would probably have been avoided if a risk analysis had been carried out at the investment preparation stage. Such an analysis would reveal potential problems connected with the quality of HDD design, problems related to unfavourable geological conditions and adjacent urban infrastructure that may arise during the investment.

Contribution of the Proposed Approach
The aim of this paper is to contribute to new models of risk prediction for HDD technology and comparing their performance. Two alternative models using random forests and artificial neural networks were developed for risk prediction for the occurrence of the identified unwanted events in HDD technology ("occurred"/"did not occur") and for the prediction of the overall HDD project effect (failure-free installation or installation likely to fail). Additionally, a model using logistic regression to predict risk was developed for several unwanted events. Machine learning application in the proposed models enabled eliminating expert involvement in the risk assessment process and significantly lower the costs associated with risk assessment process. The main contributions to the body of knowledge of this paper include: collecting essential data from HDD projects, identifying attributes relevant for the risk analysis in HDD projects (dimensions of the data set), developing three machine learning models for predicting risk in HDD projects, which allow removing the drawbacks and limitations of the risk assessment models previously described in the literature, identifying the most important metrics for risk assessment in HDD projects, selection of the model with the best performance from the three proposed machine learning models.
The outline of this paper is as follows. Section 2 describes a review of literature and limitations of the previous work. Section 3 presents the proposed approach including data collection from HDD projects carried out in eight countries, data profiling and preprocessing, as well as the way of development of three machine learning models and metrics which were used for their evaluation. The experimental results and the discussion of results, showing the main outcomes of the proposed three machine learning models and comparative analysis of the results produced using the proposed three models are presented in Sections 4 and 5. Section 6 summarizes the paper.

Review of Literature and Limitations of the Previous Work
The subject of risk identification in HDD projects has been discussed by several authors. The most important risks in HDD technology were described in [7,8,[14][15][16]. Various risk assessment models in HDD projects have been developed in recent years by researchers and practitioners in different projects. Some important issues on this topic have been presented in the author's previous works [17][18][19], in which an expert system with Fuzzy Fault analysis was applied for risk evaluation. In such an approach it was necessary to gather a group of experts, who, after familiarizing themselves with the HDD project documentation and specificity, assessed each of 22 risk factors individually. One of the advantages of those models was the possibility of taking into account the specific and dynamic conditions in which the analysed HDD project is carried out. On the other hand, the involvement of an experienced group of experts was required, which was sometimes problematic due to high costs associated with their participation and deficit of qualified specialists on the market. Moreover, special diligence needs to be drawn to an appropriate selection of the experts, because their years of experience in HDD projects of a particular size, expertise and practical skills are indispensable. Besides this, it is vital to draw separate membership functions for each group of specialists, which depicts the way in which they understand the certain linguistic term describing a possibility of unwanted event occurrence (e.g., medium risk). Ma et al. proposed a risk assessment model dedicated for MAXI HDD projects, in which the fuzzy comprehensive evaluation method and analytical hierarchy process were used [20]. Combination of those two methods gave an improved theoretical basis for risk assessment of MAXI HDD projects. Five risk factors were identified: natural, technical, economic, environmental and management. They were dependent on 17 subfactors. This model also required inviting relevant experts to analyse each index and to evaluate the relative significance of the factors and subfactors. In [21] a model based on the Failure Model and Effect Analysis (FMEA) was developed, which was dedicated for making a preliminary evaluation of the risk in HDD projects, especially those with a modest budget, in which a group of experts could not be involved in a risk assessment process. However, Energies 2021, 14, 289 4 of 28 that model is based on the statistical approach and is not sufficiently accurate to assess the risk of larger and more comprehensive HDD projects. That is why it is necessary to develop a new risk assessment model for HDD projects, in which the need to employ a group of HDD experts will be limited. Machine learning offers the possibility of overcoming the inconvenience of the need to involve experts in the risk evaluation process, as well as the inaccuracy of the adaptation of statistical models to dynamic and specific conditions on the construction site.
"Machine learning", sometimes referred as a branch of artificial intelligence, is a multidisciplinary term which concerns a set of soft computing techniques and algorithms that deal with complex natural systems and improve automatically through experience. Artificial neural networks, fuzzy logic, support vector machines, generic algorithm, and hybrid systems are regarded as the most popular machine learning tools [22]. Machine learning can be applied in real-life construction industry problems to improve quality of design, create a safer jobsite, assess and mitigate risk, increase the project's lifecycle, as well as to estimate a project's profitability.
Due to neural networks ability to compensate for the inseparable uncertainties and imperfections, which are present in geotechnical engineering, they can be successfully implemented in the area of geotechnical engineering and building construction projects [23] and trenchless technology. In [24], ANNs have been used to predict surface heave caused by shallow subsurface utility installations carried out using Horizontal Directional Drilling. It is one of important risk factors in HDD technology. ANNs have been also successfully used for prediction of the rate of penetration while drilling carbonate reservoirs [25]. Pollock et al. [26] have used machine learning algorithms to improve the efficiency of directional drilling (rate of penetration optimization, lowering tortuous borehole, lowering the number personnel on board and improving consistency across operations). Bayesian network (BN) and ANN have been successfully used for risk assessment in trenchless construction projects applying tunnelling technology e.g., risk assessment of road tunnels [27], risk analysis of construction of Porto Metro tunnel [28], risk assessment of damage to existing surface properties caused by tunnelling [29], safety risk assessment for metro construction projects [30], as well as evaluation of jamming risk of the shielded tunnel boring machines in adverse ground conditions such as squeezing grounds [31].
In [32] a risk assessment model for Box Jacking Technique of installing rectangular box culverts under existing facilities was proposed. In this approach, the influence of various parameters on surface settlement risk was determined for Box Jacking installations in sandy soil using artificial neural network and multiple linear regression analysis with finite-element modelling. It was found that soil cohesion, box culvert depth, and overcut size were the most important determinants of a surface settlement.
In [33] a new model for predicting the condition of un-inspected sanitary sewer pipes using Gradient Boosting Tree was presented. The prediction model was built based on thirteen independent variables. It achieved 87% accuracy in predicting condition of un-inspected sewer pipes. It enabled forecasting the conditions of sewer pipes, which have not been inspected so far, and therefore eliminate the costs associated with carrying out the Closed-circuit television (CCTV) inspections and overcome problems connected with limited portion of an entire sewer system. This model is helpful especially for utility companies and municipalities in forecasting condition of sanitary sewer pipes, estimating schedule inspection times, and making cost-effective decisions.
Machine learning was also successfully used for identification of the significant factors that impact the prediction of remaining useful life of water pipelines. In [34] Artificial Neural Networks and Adaptive Neuro-Fuzzy Models were applied to predict remaining useful life of water pipelines. The presented approach could be also adjusted to be useful for other types of pipelines, e.g., gas pipelines.
In [35] the conception of using artificial neural networks in the phase of the organizational and technological planning of engineering projects, particularly the building works Energies 2021, 14, 289 5 of 28 was presented. Juszczyk and Leśnak [36] have used combined Artificial Neural Networks to develop a model able to predict a construction site cost index.
The presented literature study showed that the use of machine learning allows for effective prediction of various risks in trenchless technologies and construction industry. This paper is in line with the trend of modern risk assessment using machine learning. It seems that machine learning is more commonly applied in order to predict the occurrence of a specific risk, rather than risks of several unwanted events and the overall project outcome (failure free installation or installation likely to fail). Application of machine learning in trenchless technologies and construction industry supports cost effective planning and risk prediction, allowing eliminating several inconveniences related to the need to involve a group of experts in or to conduct a series of inspections. Literature analysis showed that machine learning was used for HDD technology once to predict surface heave caused by shallow subsurface utility installations. However, no model was found in the literature in which machine learning was used for comprehensive risk assessment in HDD projects to predict the overall project outcome and the occurrence of the most important unwanted events. Figure 1 outlines the proposed approach for predicting the overall HDD project outcome (failure free installation, installation likely to fail), as well as the occurrence of the identified unwanted events ("occurred"/"did not occur"). The proposed approach includes 6 steps: data gathering, data profiling, data preprocessing, machine learning models development, models evaluation and comparison. In this paper, three popular methods of machine learning were applied. For the simplest cases, where the correlation between features and unwanted events was high (>0.85 and <−0.85), a logistic regression model was used. Then, for all cases, artificial neural network and random forests models were applied. All three models were evaluated with commonly applied metrics such as accuracy, precision, recall, f 1 score, AUC score. A detailed description of the individual steps is provided in the subsections below.

Data Gathering
The data were obtained during the authors' participation in HDD projects, as well as visits and discussions with HDD contractors. The database included data from HDD projects which were completed by HDD contractors in various countries of the world (Poland, Mexico, Australia, Thailand, the Netherlands, Bulgaria, Saudi Arabia, and Russia). It allowed gathering professional experiences and feedback from HDD projects carried out in various countries and avoid commitment to one country, its specific geotechnical conditions, finally allowing developing a model suitable for worldwide use. The data from 133 HDD projects (84 MINI, 9 MIDI, and 40 MAXI HDD) is not a huge data set, but due to the specificity of the HDD industry (data from individual projects are not widely available, a single installation in the case of complex and MAXI HDD can last for many months, collecting data on 22 unwanted events and 145 installation's attributes is very time consuming) more data could not be obtained. However, the gathered data set turned out to be sufficient to develop and verify risk assessment models, what was reflected in the obtained results.
Unwanted events in HDD installations were identified based on the analysis of the surveys, which were conducted in five different countries and are the same as described in the author's previous work [17]. Table 1 shows the list of unwanted events and their symbols. HDD installation's attributes were identified based on scenario analysis and the information obtained during the brainstorm sessions and meetings with experienced HDD contactors, owners, as well as manufacturers of drill rigs, drill rods, steering systems, drilling fluids, and product pipes. Moreover, some observations of various HDD installations run were also valuable for the identification of attributes of HDD installations. In addition to the basic attributes that characterize a given HDD installation, such as pipeline diameter, borehole length, maximal depth etc., the attributes also included detailed information about the installation (such as number of the test holes, the depth of the geotechnical tests carried out, parameters related to the designer's, driller's, chief superintendent engineer's and supervisor's experience, their certification, as well as the most important risk mitigation actions planned to be used (e.g., drilling fluid additives, trial drilling, emergency procedures). Due to their length, Table A1 with 145 attributes of HDD installations have been included in the Appendix A.

Symbol
Unwanted Events e 1 Incorrect calculations of loads and stresses for the installed pipeline e 2 Failure to consider the allowable bend-radius of drill pipes or the installed product pipe e 3 Incorrect choice of the external product pipe coating e 4 Problems with steering and communications with the drill rig e 5 Drill tool breakdown caused by the material's fatigue e 6 Drill rig failure e 7 Mud motor failure e 8 Mud cleaning system failure e 9 Roller blocks breakdown e 10 Roller cradles breakdown e 11 Side cranes failure e 12 Ballasting system failure e 13 Downtime in the installation due to lack of required tools and machines e 14 Unexpected natural or anthropogenic underground obstacles e 15 Borehole collapse e 16 Swelling of the ground leading to the drilling pipe or product pipe blockage in the borehole e 17 Drilling fluid runoff e 18 Contractor's mistake e 19 Quality or supply issues e 20 Problems with permissions or legal issues e 21 Unfavourable weather conditions e 22 Improper cost calculations for the project OK The overall project result

Data Profiling
The factors differentiating the individual projects from which the data were derived are presented in Table 2. The analysed installations differed in terms of geographic area, specificity of area, geometric parameters of the drilling trajectory, installation size, pipe material, the type of installed utility, the type of steering system, ground conditions, season, and were carried out with the use of various machines and devices. Therefore, their geographic, technological, geotechnical and equipment diversity was clearly visible. Optional possibility of applying tools and machines mud motor, mud leaning system, ballasting system, roller blocks, roller cradles, side cranes Table 3 shows the structure of the analysed data set in terms of the number of occurrences of particular unwanted events (e 1 -e 22 ). The analysis of data in Table 3 shows that the events e 12 , e 7 , e 10 , e 11 , e 9 , e 21 were the least frequent in the collected data set. This is consistent with the results of the survey carried out in the author's previous work [16], according to which low frequency of occurrence was also obtained for these events. In the case of other events, the proportion between the number of installations, in which those events occurred and did not occur, is satisfactory.
Parson's correlation coefficient was applied to identify correlated dimensions. Correlated dimensions (between unwanted events and features, as well as between features themselves). Careful analysis of the correlated dimensions allowed to find repeating regularities in the analysed HDD projects and identify correlation causes.
Principle Component Analysis (PCA) was used to find data patterns in data of high dimension in order to argue that failure-free installations can be distinguished from failed ones. It is a tool for data analysis that works on the whole data set (it is a matrix of all dimensions and all data samples). It allows visualizing the dominant data patterns and is widely used for data simplification, dimensionality reduction, outlier detection and classification.
Energies 2021, 14, 289 8 of 28 Table 3. The structure of the analysed data set in terms of the number of occurrences of particular unwanted events.

Data Preprocessing
The data has been preprocessed. Due to the difficulties in obtaining data, it was not possible to obtain information on all features for all cases. In the case where the number of shifts was unknown, one shift was adopted as this is the most common in HDD drilling. In the cases of HDD installations where the percentage of clay-sized particles and plasticity index were not tested (e.g., no geotechnical tests), the values of 50% and 50 were assumed for these parameters, as they were the maximum, most pessimistic values in the data set. In the case of HDD projects for which a preliminary risk assessment was not carried out, the value of the parameter "risk level" (in a scale of 1-5) was not known. In such cases the maximum risk level of 5 was assumed, i.e., the maximum, most pessimistic values in the data set.
Due to the fact that the obtained values for certain parameters (pipeline diameter, bore length, maximum depth, percentage of clay size particles, plasticity index, no. of working hours) had a large dispersion, the z-score technique was used to make the model results independent of large absolute values. Z-score is a popular value standardization technique, which is widely used in machine learning applications. It indicates how far the analysed value is from the mean in terms of standard deviation. For example, for x 2 , z-score is defined by the formula: where: To prepare data for use with machine learning models, the categorical values were converted to the one-hot vector representation as shown in Figure 2. The primary purpose of this retrieval was to encode the categorical values into the appropriate numerical form. Therefore, an individual dimension was introduced for each categorical value. In the case of the x 1 attribute, three dimensions were introduced depending on the installation size x 1 = MINI, x 1 = MIDI and x 1 = MAXI. For example, if the analysed installation was of the MINI size, the dimension x 1 = MINI takes the value 1, and the remaining dimensions take the value 0. Vectorization was performed in the same way for the attributes such as steering system, pipe material, obstacle being crossed, and the type of the area. That is why the overall number of data dimensions is larger than the actual number of installation attributes.
Version December 30, 2020 submitted to Energies 8 and 50 were assumed for these parameters, as they were the maximum, most pessimistic values in the d 240 set. In the case of HDD projects for which a preliminary risk assessment was not carried out, the valu To prepare data for use with machine learning models the categorical values were converted to . . .

(a)
Original data

Logistic Regression Model
Linear regression allows making a very simple prediction in situation when there is a linear correlation between the installation attributes and expected outcome. In this work we applied regression model for events that show some significant correlation with installation dimensions. The results of the linear regression output may take values in the range (−∞, +∞), while the probability in the range <0,1>. In order to convert a linear value to a probability value the logistic regression was used. The resulting probability needs to be mapped to the final binary outcome (failure-free or likely to fail) using a threshold value (t). In the case of the Logistic Regression model value of t was experimentally selected as 0.5.

Random Forest Model
Decision tree is an interesting classifier that tries to determine the importance of the analysed features and their impact on the classification results. However, when using decision trees, there are several problems, such as sensitiveness to the applied training data form (e.g., a change in the order of data may result in obtaining different results). Moreover, subsequent branches of decision trees are burdened with an increasing degree of uncertainty due to the fact that they are created on the basis of ever smaller data set [37]. Random forests are often applied to solve some such problems. The use of random forests makes it possible to combine the results from multiple trees that were created on a randomly selected subset of the training data. In this work random forests were prepared for each of the 21 unwanted events separately and for the result of the entire HDD installation. The maximum tree depth was experimentally chosen to be 4. At this depth, the best results were obtained. The end result of the use of random trees was to find such a division of the data set on the basis of given features that allows obtaining the most uniform results in these subsets (e.g., the vast majority of the data in one set are installations, in which the analysed event occurred or the vast majority of the data in another set are installations, in which the analysed event did not occur). It allowed detecting dependencies that occur in the training set, even if the connection was accidental. Figures 3 and 4 show the way of making decisions based on an exemplary decision tree that was used.   Figure 3 shows an example tree developed for the event e 19 . It is an example tree because in the case of random forests we can deal with many trees generated for the same event. When analysing the tree structure it can be seen that the parameter x 140 "the exchange rate in the case of carrying out works abroad and paying for materials and equipment in foreign currency" divides the input data set into two subsets. In the case when exchange rate was ordinary (=0), the event e 19 occurred in three installations and did not occur in 75 installations. In the case when the exchange rate was high (=1), the event e 19 occurred in 18 installations. Analysing the left branch it can be seen that in the case when Occupational Health and Safety procedures for ballasting system (x 65 ) were not prepared (=0) the event e 19 did not occur in 75 installations and occurred in 2 installations. In the case when they were prepared, the event e 19 occurred once, so the event e 19 was classified by the tree as "occurred". In the case when the drilling rig was not equipped with full automation system (x 105 = 0), the event e 19 occurred twice and did not occur in 20 projects. If the drilling rig was equipped with a full automation system, the event e 19 did not occur in all analysed 55 projects. In the case when the drilling depth (x 4 ) exceeded 2.9 m, the event e 19 did not occur in 11 projects and if the depth of 2.9 m was not exceeded, it occurred twice and did not occur in 9 installations, so the event e 19 was classified by the tree as did not occur. Analysing the right branch, it can be seen that in the case when the number of site investigation methods (x 71 ) was 0 (e.g., in the rural area) the event e 19 did not occur in the analysed nine installations. If it was (>=1) it occurred 18 times and did not occur once. In the case when the applied materials were not certified (x 118 ), the event e 19 occurred in 18 installations, so the event e 19 was classified by the tree as "occurred". If they were certified it did not occur, so the event e 19 was classified by the tree as "did not occur". The presented numerical values refer to the training data sample. Figure 4 shows an example tree developed for the event e 22 . When analysing the tree structure it can be seen that the parameter x 140 "the exchange rate in the case of carrying out works abroad and paying for materials and equipment in foreign currency" divides the input data set into two subsets. In the case where the exchange rate was ordinary (=0), the event e 22 occurred in 10 installations and did not occur in 68 installations. In the case when exchange rates were high (=1) it occurred in 27 installations and did not occur in 1 installation. Analysing the left branch it can be seen that in the case when the drilling rig was equipped with protection system against failures (x 40 = 1), the event e 22  classified by the tree as "did not occur". Analysing the left branch it can be seen that in the case when geotechnical investigations were carried out at least to the maximum depth of the drilling (x 73 = 1), the event e 22 did not occur in one installation and occurred in 27 installations. If geotechnical investigations were not carried out at least to the max depth of the drilling (x 73 = 0), the event e 22 occurred in all analysed 24 installations, so the event e 22 was classified by the tree as "occurred". When gyro steering system was used (x 15 = GYRO) the event e 22 did not occur in 1 installation, so the event e 22 was classified by the tree as "did not occur". If gyro system was not applied (x 15 = GYRO), the event e 22 occurred in all three analysed installations, so the event e 22 was classified by the tree as "occurred". The presented numerical values refer to the training data sample.

ANN Model
The inspiration for the development of neural networks were the biological information processing processes taking place in the brain. The proposed ANN automatically learns classification of HDD projects in terms of the unwanted events' occurrence. It requires a proper number of training data samples (HDD installations) to generalize patterns occurring in them. ANN consists of several layers. The basic building block of a network is a neuron that performs elementary information processing. The network consists of a series of neurons connected into successive layers and additional auxiliary layers. Figure 5 presents an artificial neural network architecture that was developed to classify HDD projects. The input layer contains 178 neurons due to the number of dimensions of the HDD installation after applying one-hot vectorization to HDD installations' attributes. The next dense layer means that each node is connected to each previous layer. Exponential linear unit was used as an activation function for neurons in this layer. To prevent over-fitting, next layer consists of dropout unit, which aims to ignore parts of information from neurons and consequently allow better data generalization and over fitting avoidance. The next layer is dense, consisting of 23 neurons (output layer), which contains neurons responsible for individual events, equipped with a sigmoid activation function. Due to the binary nature of the event ("occurred"/"did not occur"), the output is assigned to the classification threshold function as in case of logistic regression. The structure of the network and its hyper-parameters, such as the number of neurons in the hidden layer, the dropout value, and the classification threshold were selected experimentally. Model training was carried out using the first order gradient-based optimization of stochastic objective algorithm based on adaptative estimates of lower order moments commonly known as "Adam". The learning was aimed at minimizing the binary cross entropy function.

Models Evaluation
Models evaluation was conducted in such a way that it validates the quality of the binary classification of HDD installations ("occurred"/"did not occur") in relation to the selected risk events. Table 4 presents the typical confusion matrix that was a basis for calculations of the metrics. True Positive (TP) depicts the number of correctly predicted HDD installations for which a certain risk event occurred. False positive (FP) represents the number of HDD installations for which a certain risk event was predicted but it did not occur. False Negative (FN) depicts the number of HDD installations for which a certain risk event occurred although it was not predicted. True Negative (TN) represents the number of HDD installations for which a certain risk event was not predicted and did not occur. In this work recall was defined as the ratio of the number of HDD projects for which a certain event was correctly predicted as "occurred" to the total number of the projects in which a certain event really occurred: It describes the ability of the system to properly classify occurrence of a certain event in HDD projects, but does not consider the number of projects in which a certain event did not occur. Precision was calculated as the ratio of the number of HDD projects for which a certain event was correctly predicted as "occurred" to the total number of the HDD projects in which a certain event was predicted as "occurred": December 30, 2020 submitted to Energies 11 of 29 ccur"), the output is assigned to the classification threshold function as in case of logistic regression. tructure of the network and its hyper-parameters, such as the number of neurons in the hidden layer, opout value, and the classification threshold were selected experimentally. Model training was carried sing the first order gradient based optimization of stochastic objective algorithm based on adaptative ates of lower order moments commonly known as "Adam". The learning was aimed at minimizing the y cross entropy function. odels evaluation was conducted in such a way that it validates the quality of the binary classification D installations ("occurred"/ "did not occur") in relation to the selected risk events. Table 4 presents the l confusion matrix that was a basis for calculations of the metrics.  It describes the ability of the system to correctly predict the occurrence of a certain event, but does not include cases that are classified as "did not occur". Accuracy was Energies 2021, 14, 289 13 of 28 defined as the ratio of the number of HDD installations in which a certain event was correctly predicted by the system, to the total number of HDD projects: f 1 is defined as a harmonic mean of precision and recall for a certain risk event being analysed.
Receiver Operating Characteristic (ROC) curve shows the ratio of true positive to false positive in the full range of possible classification thresholds (t). AUC score is the integral of this curve over all t values. It depicts the correctness of the classification regardless of the adopted threshold.

Correlations
Parson's correlation coefficient was applied to identify correlated dimensions. Correlated dimensions (between unwanted events and features, as well as between features themselves) and their Parson's correlation coefficients were presented in Table 5. Careful analysis of the correlated dimensions allowed to find repeating regularities in the analysed HDD projects and identify correlation causes. In good HDD contracts attention was paid to engagement of both certified supervisor (x 103 = 1) and superintendent engineer (x 101 = 1), as it was suggested in [38]. In the cases when mud motor was applied, contractors were aware of the need to carry out periodical inspections (x 43 = 1) and knew good practices that its elastomeric elements had to be changed after each downhole trip (x 44 = 1). Bore hole collapse often occurred (e 15 = 1) if the drilling crossed any sand layer with homogenous grain size distribution (x 81 = 1), as it was stated in [39]. Improper calculations for the investment often occurred (e 22 = 1) when the cheapest contractor was chosen without paying particular attention to quality of the services offered (x 133 = 1). If the HDD designer did not have suitable knowledge and experience (x 7 = 0), problems related to improper calculations of loads and stresses during the installations (e 1 = 1), as well omitting to consider allowable bending radius of drill or product pipe occurred (e 2 = 1). Problems with supply and quality usually did not happen (e 19 = 0) if a supplier was certified (x 116 = 1) and all materials had adequate corresponding quality certificates (x 118 = 1). Preliminary estimated risk level for the HDD project (x 136 ) increased with the decrease of the no. site investigation methods applied (x 71 ), no. of working shifts (x 106 ), with proximity of backfills which could act as a drainage (x 121 ), if works were carried out in spring (x 128 ) or winter (x 131 ) which posed many risks (floods, low temperatures, strong winds), if no additives to the drilling fluids reducing collapse risks were used (x 144 ) and if other steering systems than gyroscope were used (x 15 = GYRO). Some dimensions were randomly correlated.
Due to the small amount of data, the correlated dimensions were not eliminated. There was no need to optimize the calculation speed due to the small group of data. Figure 6 presents the results of the applied PCA method on the whole data set. It illustrates three dominant components of PCA for the whole data set. The analysis showed that failure-free installations were clustered close to each other, and it predisposes to find differences in the data set. That allowed to discriminate those two subsets using machine learning models. As a result of the analysis, four significant outliers were identified. The main reason for these installations being different from the others was a very large borehole diameter or borehole length. It should be noted that long and large diameter HDD installations are usually more complex and technologically complicated than MINI and MDI HDD installations. x 121 (Proximity to existing utilities-acting as a drainage) −0.88 x 136 (Risk level)

PCA
x

Logistic Regression
Due to the fact that after calculating the correlation coefficients, it turned out that some events are closely correlated with some features, it was possible to estimate the occurrence of some unwanted events using logistic regression. Table 6 shows the evaluation of a logistic regression model using recall, precision, accuracy, f1 and AUC score. The proposed approach using logistic regression shows quite good recall, precision, accuracy, f 1 and AUC score values in a test dataset. For the events e 9 , e 10 , e 11 , e 19 a full compliance with the test set was achieved. The evaluation results for e 15 and e 5 are poor. For the rest it is satisfactory. The poor recall values for e 15 and e 5 indicate that the system properly classifies those events when they did not occur and poorly when they occurred. It indicates that for these events one should look for other, more effective methods of classification. Table 7 shows the evaluation of a random forests model using recall, precision, accuracy, f 1 and AUC score.

Random Forests Model
The proposed approach using random forests shows quite good recall, precision, accuracy, F1 and AUC score values. For the events e 7 , e 9 , e 10 , e 11 , e 16 , e 19 , e 22 and the final outcome of the HDD project (depicted as OK) full compliance with the test set was achieved. The event e 12 occurred only twice in the input file, which made it impossible to train the forest correctly. Additionally, the test set did not contain any case in which this event occurred. It should be added that the frequency of this event occurrence assessed thanks to the survey described in the author's previous work [16] was only 2% in the analysed 5940 HDD projects. The worst results were obtained for the event e 21 . This is because this event is related to weather conditions, which is a specific parameter, so the random forest was unable to learn the correct prediction. It should be added that the frequency of this event occurrence assessed thanks to the survey described in the author's previous work [16] was 7% (for severe weather conditions) and 2% (for flood) in the analysed 5940 HDD projects. For the event e 1 , e 2 , e 4 , e 3 , e 6 recall ranging from 0.600 to 0.667 was obtained, which leaves room for improvement. For the remaining events, satisfactory compliance with the test set was obtained at the level of 0.750-1.000. The presented model evaluation results shows that the proposed method is satisfactory, but could be significantly improved using more advanced classification methods. Figure 7 shows the learning history of the proposed ANN. The chart shows that the loss decreases with the increase in the number of epochs, while the applied metrics (precision, recall, accuracy, and AUC score) improve systematically. Ultimately, the number of eras was selected to be 100 to prevent overfitting the network. Table 8 presents performance results of the proposed ANN model in a test dataset.

ANN Model
The proposed approach using ANN shows very good recall, precision, accuracy, f 1 and AUC score values. For the events e 1 , e 2 , e 7 , e 9 , e 10 , e 11 and the final outcome of the HDD project (depicted as OK) full compliance with the test set was achieved. The event e 12 occurred only twice in the input file, which made it impossible to train the network correctly, similarly to the case of random forests. Additionally, the test set did not contain any case in which this event occurred. The worst results were produced for the event e 21 . This is because this event is related to weather conditions, which is a specific parameter, so the network was unable to learn the correct prediction. For the event e 3 , recall of 0.667 was obtained, and for e 16 , precision of 0.667 was obtained. For the event e 4 , e 14 and e 20 recall around 0.8 was achieved, but in those cases high accuracy was obtained (≥0.926). This means that in these cases the system is worse at detecting that a given event occurred than that it did not. For the remaining events, very high compliance with the test set was obtained at the level from 0.857 to 1.000. The presented model evaluation results show that the proposed method is effective in predicting the overall HDD project outcome, as well all 21 identified sub-risks occurrence ("occurred"/"did not occur").   In Figures 8-11, ROC curves for the chosen unwanted events are presented. For the final outcome of the project (depicted as OK), it can be seen that a very good result was achieved, because regardless of the selected classification threshold, the proposed ANN model correctly classifies HDD installations. Figures 9-11 show the most interesting ROC curves, where the AUC score is less than 1, which means that the classification depends to some extent on the selected classification threshold. Despite the fact that the curves presented in Figures 9-11 indicate the poorest results among all obtained, they are satisfactory, because obtained AUC scores for them are more than 0.9. The worst results were obtained for event e 21 , which was related to unfavourable weather conditions. The fault that occurs in this plot (in Figure 11) shows that in certain situations the proposed model may inaccurately classify the event e 21 as "occurred".

Discussion of the Results
In the case of risk assessment, it is the most important to minimize the cases where a particular event was not predicted as "occurred" but actually occurred. Such situations belong to the "false negative" group in Table 4. However, if a risk assessment system predicts that a given event will occur, and in fact it will not occur, its consequences, such as introducing risk mitigation strategies, are less serious than in the case of not predicting this event. Not predicting the occurrence of a given event (which is in fact likely to occur) may lead to not introducing any risk mitigation strategies, finally resulting in the actual occurrence of this event in the HDD project. Therefore, from the point of view of risk assessment in HDD projects, the most important indicators are those that take into account the "false negative" group, thus recall and accuracy should be analysed first.
To further discuss the achieved results and to compare the effectiveness of the proposed models predicting unwanted events' occurrence in HDD projects, the authors carried out a comparative analysis regarding the predicting performance between the proposed models. Figures 12-16 clearly present the comparison of the results of three applied machine learning models in terms of recall, precision, accuracy, f 1 and AUC score. Analysing the Figures 12-16 it can be concluded that the best prediction method for HDD projects is the proposed ANN model. Random forests are second, and logistic regression, thanks to which only eight events could be predicted, is third. The proposed ANN model outperforms the rest of the models in terms of recall for seven events' prediction, precision for one event prediction, accuracy for six events' prediction, f 1 for seven events' prediction and AUC score for 14 events prediction. The proposed ANN model turned out to produce the best results (or no better result was obtained by other methods) in terms of recall for the prediction of 21 events (e 1 , e 2 , e 3 , e 4 , e 5 , e 6 , e 7 , e 8 , e 9 , e 10 , e 11 , e 13 , e 15 , e 16   Analysing the performance of three proposed machine learning models for predictions of particular unwanted events' occurrence, the following conclusions can be drawn. For the events e 1 , e 2 , e 4 and e 17 , the best results for all metrics were obtained for the proposed ANN model (only in terms of precision random forests gave equally satisfactory results). For the final result of the project depicted as OK the best results for all metrics were obtained for both the proposed random forest model and the ANN. For the events e 9 , e 10 , e 11 , the same Energies 2021, 14, 289 20 of 28 results were obtained for all the proposed three models in terms of all metrics. For the event e 8 the best results were obtained for the proposed random forest model and ANN in terms of recall, and in terms of precision all the proposed three models gave the same results. For the event e 19 the best results were obtained for all three methods in terms of recall and precision. For the events e 5 and e 6 the best results in terms of recall were obtained for the proposed ANN model, and in terms of precision, for the proposed random forest model. This means that for these events the proposed model of random forests is better at predicting whether the positive identification of an event was actually true. For the event e 22 , the same results were obtained in terms of recall for all three proposed models, while in terms of precision, the logistic regression model was the best, and in terms of accuracy, the random forest model was the best. For the remaining events, the best results were obtained for the ANN model and random forests, with different priorities for individual metrics.   In the previous and only work found in the literature on the application of machine learning in HDD technology [24], a surface heave prediction model was proposed. It can be considered as a risk assessment model for one unwanted event in HDD technology. The models proposed in this paper are novel, as they enable efficient, fast and robust risk predictions for 21 most important unwanted events in HDD technology and the overall project outcome.

Conclusions
This study proposes three new models for predicting risks in HDD projects. To develop those models, the data from 133 HDD projects from eight countries of the world was gathered, profiled and preprocessed. Three models based on the following methods of machine learning: logistic regression, random forests and Artificial Neural Network were developed and their performance was assessed. The developed ANN model demonstrates significant performance in the field of the HDD project outcome prediction, as well as the occurrence of 21 identified unwanted events despite relatively small dataset for learning. It outweighs random forests. The proposed logistic regression model could be applied to properly predict only 8 events.
The results show that the proposed ANN model proved to be the most efficient, fastest, and most robust in predicting risks in HDD projects. Moreover, the running time of the proposed ANN model architecture is much less than carrying out traditional risk assessment, in which a group of HDD experts must be involved. The proposed approach is an accurate prediction model, as it makes efficient predictions of unwanted events' occurrence, showing minor deviations between the real and the predicted values. Since the results of risk assessment are not only critical when assessing the project feasibility and making the project costing, but also a starting point for introducing the risk management strategy, this model becomes very useful to accurately determine risk levels for important unwanted events. It is expected that the practical application of the proposed model will lead to the quality improvement of the installed energy transmission infrastructure and reduction of the number of unsuccessful installations of gas, oil and electricity pipelines. Moreover, it can contribute to avoiding of strikes on the existing elements of underground infrastructure of cities, which may lead to fatal accidents (e.g., hitting a gas pipeline or oil pipeline).
This paper is the first one to propose machine learning models dedicated for assessing the overall risk of HDD project, as well as the occurrence of 21 unwanted events in this technology. The main contributions to the body of knowledge include: collecting essential data from 133 HDD projects from eight countries, identifying 145 attributes relevant for the risk analysis in HDD projects (dimensions of the data set), developing three machine learning models for predicting risk in HDD projects, which allow removing the drawbacks and limitations of the risk assessment models previously described in the literature, identifying recall and accuracy as the most important metrics for risk assessment in HDD projects, selection ANN model as the one with the best performance from the three proposed models.
Additionally, thanks to applying machine learning in the proposed models, the need to engage a group of HDD experts was eliminated. It also contributes to the reduction of the imperfections, with which the traditional risk assessment expert systems have struggled, such as: being based on the opinion of individual experts and their knowledge (not always properly matched to the project size and specificity), lack of required experts' experience and knowledge, limited project budget not allowing employment of well qualified experts, difficulties in engaging quality industry specialists. All in all, in this work widely available models were developed, for which the costs and problems connected with involving experts are not a barrier. This work is helpful in making the right decision about starting the HDD project. Moreover, it supports creating realistic projects delivery and performance options by HDD owners, engineers and contractors.
Future research is oriented towards integrating the proposed approach into a comprehensive holistic risk management system. Such a system will additionally include the risk response options for individual unwanted events. It will enable a dynamic risk assessment taking into account various combinations of planned risk responses. A similar approach is also planned to be used in the future for risk assessment in various trenchless construction methods and choosing that one with the lowest risk. Pipeline diameter (mm) x 3 Bore length (m) Pipeline material x 6 Crossed obstacle x 7 Does the designer have min 3 years of experience in HDD projects in a certain size (MINI, MIDI, MAXI)?
x 8 Does the designer have positive references from similar projects (project size, ground condition, natural environment specificity, season) x 9 Was the correctness of the calculations of the designer checked using the appropriate computer program (e.g., Horizon, HDD Designer, D-Geo Pipeline) x 10 Urban area x 11 Posti-ndustrial area Was an assessment of the expected geological conditions used to determine the most appropriate coating for the pipe? x 13 Were erosion and corrosion protection coatings designed for steel pipes? x 14 In a case of HDPE pipes: was crack and gouge allowance considered? x 15 Steering system type (gyro, wireline, walkover) x 16 Was the identification of interferences carried out? x 17 If yes, if it revealed any interferences? Table A1. Cont.

Symbol Features
x 18 If yes, were interferences temporarily disabled? x 19 The wireline length (m) x 20 Are spiders planned to be applied in case of the long wireline systems? x 21 If yes, is distance between spiders max. 150 m? x 22 Is quality wireline coating designed e.g., XHHW (Cross-Linked High Heat Water Resistant Insulated Wire)? x 23 If drilling in rock formations-are there any tools chosen to damp transmitters vibrations?
x 24 Is there any drilling planned in abrasive soils, rocks or cobbles, in which there is large amount of heat transfer from the drill head to the transmitter housing?
x 25 Is there drilling planned in gravel and the grounds containing boulders, where steering problems or unresponsive steering may occur? x 26 The drilling depth in the case of applying the walkover systems (m) x 27 Does the manufacturer of steering system have quality certificate ISO 9001 or adequate given by a third party?
x 28 In the case of applying the walkover system was the time of the drilling and the battery capacity considered in plans?
x 29 In the case of applying the walkover system is it a problematic crossing of big rivers, rivers with a strong current, highway or railway crossings where is usually a problematic need that the receiver should be positioned directly over the transmitter?
x 30 Were stress limits defined based on the type and material of the drill pipe, establishing tension and torque limits, defining drilling radii and deviations?
x 31 Were geotechnical investigation carried out at the planning stage taken into consideration in determining the type of equipment needed x 32 Presence of salty water or acidity soil Is excessive wear anticipated after analysis of the geotechnical conditions (difference with the strength of soil layers or rock, resulting in applying the force only to the part of the reamer)? x 34 Were the drilling tools repaired previously? x 35 Does the manufacturer of drill tools have quality certificate ISO 9001 or API? x 36 Are OHS procedures for drill tool failure prepared? x 37 Are the periodical drill rig inspections carried out according to schedule? x 38 Was the drill rig previously repaired? x 39 If yes, were original parts used for reparation?
Has a drill rig protection system against failures (e.g., the automatic supervision during standard operation)?
x 41 Does the manufacturer of drill rig has quality certificate ISO 9001 or adequate given by a third party and is it in conformity with National Machine Guidelines derived from European Machine Guidelines? x 42 Are OHS procedures for the case of the drill rig breakdown prepared? x 43 Are the mud motor periodical inspections carried out according to schedule? x 44 Will be mud motor components that have elastomeric elements new? x 45 Is high solids or sand content in the drilling fluid expected? x 46 Was the mud motor previously repaired? x 47 If yes, were original spare parts used for mud motor reparation?
x 48 Does the manufacturer of mud motor have quality certificate ISO 9001 or adequate given by a third party and is mud motor in conformity with National Machine Guidelines derived from European Machine Guidelines? x 49 Are OHS procedures for the case of the mud cleaning system breakdown prepared?
Was the mud cleaning system previously repaired?
If yes, were original parts used for mud cleaning system reparation?
x 52 Does the manufacturer of mud cleaning system have quality certificate ISO 9001 or adequate given by a third party? x 53 Are OHS procedures for the case of the mud cleaning system breakdown prepared? x 54 Are the periodical inspections of roller blocks carried out according to schedule?
x 55 Was any of the planned to use roller block repaired and were original spare parts used? (no, original, not original) x 56 Are OHS procedures for the case of the roller blocks breakdown prepared? x 57 Does the manufacturer of roller blocks have quality certificate ISO 9001 or adequate given by a third party? x 58 Are the periodical inspections of roller cradles carried out according to schedule? x 59 Does the manufacturer of roller cradles has quality certificate ISO 9001 or adequate given by a third party? x 60 Are OHS procedures for the case of the roller cradles breakdown prepared?
Are the periodical inspections of side cranes carried out according to schedule? x 62 Does the manufacturer of side cranes have quality certificate ISO 9001 or adequate given by a third party? x 63 Are OHS procedures for the case of the side cranes breakdown prepared?
Are the periodical inspections of the ballasting system carried out according to schedule? x 65 Are OHS procedures for the case of the ballasting system breakdown prepared? The size of the previously realized installation (1-3 pts., 1-MINI, 2-MIDI, 3-MAXI) x 68 The complexity and challenges connected with previously realized installation (0-typical installation, 1-challenging length, diameter for the contractor or challenging grounds) x 69 Were there any delays indicated in the references of the contractor from similar projects that have been carried out so far?
x 70 Does the geotechnical surveying company have references from similar projects (project size, ground condition, natural environment specificity)?
In the case of urban or post-industrial areas: the number of site investigation methods used x 72 No. of test holes x 73 Are geotechnical investigations carried out at least to the max depth of the drilling? x 74 Are geotechnical tests only archive or prepared for another project purposes?
Were literature research, historical data, interviews with residents carried out?
x 76 Is an experienced geotechnician (with certificates and references from similar projects) employed to properly interpret of the results of geotechnical survey? x 77 Is a trial drilling planned (form MAXI and complex HDD)?
x 78 For urban and post-industrial areas in which underground infrastructure was identified: is exposing and monitoring the existing underground infrastructure located close to the planned alignment? x 79 Are emergency procedures for an utility strike prepared?
x 80 For urban or post-industrial areas-Are plans with underground utilities localization available and was the inspector asked if all changes in urban infrastructures were put on the map?
Is there any sand layer with homogeneous grain size distribution that will be crossed?
Is there any layer that consists of pure sands, gravel or loose rock?
x 83 Does the ground contain oversize materials (cobbles and boulders), heavy, large grains that gravitationally fall to the bore hole bottom? x 84 If yes, not many-0, many 1, much 2 x 85 Percentage of clay sized particles (smaller than 0.075 mm) (for clay and silt layers) for the layer of max plasticity index x 86 Soil plasticity index of a soil sample (for clays and silts) x 87 Are there considerable elevation differences between the entry and exit points or points along the alignment x 88 Is the any area situated along the alignment with the depth cover less than 12 m or 8.5 borehole diameter or 2.5 borehole diameter under rivers?
x 89 Is there area with significant changes in density or composition of ground conditions that will be drilled through?
x 90 Is there any layer of drilling in the clear, coarse-grained, permeable soils (e.g., in sands, gravels containing less than 12% of fine or in fractious rocks)?
x 91 Is there any area where the HDD alignment is close to existing utilities located in backfills, which were filled with trench backfill materials, which could act as a drainage for the drilling fluid? x 92 Is strong groundwater inflow indicated in geotechnical survey?
Were drilling fluid pressure calculations carried out?
If yes, do they indicate drilling fluid seepage? x 95 In the case when the 1st section is problematic-is a casing pipe designed to protect the first hole section? x 96 Does the driller have min 3 years of professional experience with drill rigs of a designed pulling force?
x 97 Is the driller certified by a third party for the designed drill rig force (e.g., Drilling Contractors Association, International Society for Drilling Contractors)? x 98 Does the contractor's company have references from similar projects (size, ground conditions, specificity)?
No. of working hours x 100 Does the chief superintendent engineer have min 3 years of professional experience?
x 101 Is the chief superintendant engineer certified by a third party for drilling operations with the designed pulling force x 102 Does the supervisor have min 3 years of professional experience in drilling operations with the designed pulling force?
x 103 Does the supervisor have a certificate given by a third party for drilling operations with the designed pulling force?
Is augmented reality planned to be used to increase the drill rig operator's awareness of underground utilities?