Selection of Optimized Retaining Wall Technique Using Self-Organizing Maps

: Construction projects in urban areas tend to be associated with high-rise buildings and are of very large-scales; hence, the importance of a project’s underground construction work is signiﬁcant. In this study, a rational model based on machine learning (ML) was developed. ML algorithms are programs that can learn from data and improve from experience without human intervention. In this study, self-organizing maps (SOMs) were utilized. An SOM is an alternative to existing ML methods and involves a subjective decision-making process because a developed model is used for data training to classify and effectively recognize patterns embedded in the input data space. In addition, unlike existing methods, the SOM can easily create a feature map by mapping multidimensional data to simple two-dimensional data. The objective of this study is to develop an SOM model as a decision-making approach for selecting a retaining wall technique. N-fold cross-validation was adopted to validate the accuracy of the SOM model and evaluate its reliability. The ﬁndings are useful for decision-making in selecting a retaining wall method, as demonstrated in this study. The maximum accuracy of the SOM was 81.5%, and the average accuracy was 79.8%.


Introduction
The selection of a suitable construction method is necessary because of site conditions and the unique characteristics of each construction project. An increasing number of high-rise structures have been constructed in recent years, and the excavation depths of underground constructions are becoming larger. An essential task in underground construction is selecting suitable retaining wall techniques according to the excavation depth and other working parameters. However, the factors that need to be considered when selecting retaining wall techniques are extensive, and it is impossible to assess the different factors of numerous types of data accurately during the decision-making stage. Therefore, machine learning (ML) and artificial intelligence (AI) techniques are used to analyze and predict big data. In most cases, existing AI techniques require a significant amount of data to produce reliable results. Another limitation of previous studies was that the technique was chosen without following a reasonable validation process between the variables and excavation methods. The selection of a retaining wall technique usually depends on the subjective opinion of experienced practitioners [1]. Choi and Lee [2] carried out retaining wall technique forecasting using a statistical approach, i.e., a decision tree, 2 of 13 for a few cases. The selection of a suitable method of retaining wall construction is usually performed by experienced engineers using their historical and empirical knowledge. Generally, the selection of the retaining wall technique at an early construction stage can involve uncertainties, such as those pertaining to the type and distribution of the soil and underground water geotechnical investigation data. Choi and Lee [2] also reported that the inappropriate forecasting of a retaining wall technique has led to significant schedule delays and increased costs, owing to unexpected changes in the construction method during the actual construction. In addition, several factors are considered when selecting the mudslide construction method; because it is based on uncertain information, design changes are frequently made, resulting in frequent construction cost demands and air losses. When the construction process commences, conflicts between owners and contractors occur because of construction costs. In particular, the most significant problem is the incurrence of additional construction costs (in addition to the initial estimated amount). The problem is often related to the construction of a retaining wall, which is the most expensive and the most laborious in terms of additional construction work. Thus, there is a significant scope for changes in the design to minimize construction costs and duration, which generally influence the construction work outcome.
In this study, a self-organizing map (SOM) model was used to select a retaining wall construction method during the design phase of a construction project. The sequence of the work can be described as follows: (1) the retaining wall technique is selected, (2) the practicability of the SOM model is evaluated, and (3) a case study involving the determination of the retaining wall technique is conducted. The subsequent sections introduce the concept of the SOM as well as its working process and applicability in construction. The case-study section describes the development of the SOM model. The application is discussed in the next section. Finally, the results of this study, conclusions, and suggestions for future research are discussed in the concluding section.

Literature Review
AI techniques include case-based reasoning (CBR), expert systems (ESs), neural networks (NNs), adaptive boosting, and support vector machines (SVMs). The application of AI techniques for solving complex decision-making problems in the construction industry has attracted considerable interest within the last decade. The challenge of selecting the retaining wall technique is a typical example. Shin et al. [1] established the AdaBoost model to select a suitable retaining wall technique by comparing the SVM model. The process is a weak classifier, depending on the type and characteristics of the data. Yang [3] applied an ES to organize a formal decision process while considering a rule-based knowledge system in selecting retaining walls from previous cases. Kim et al. [4] developed an NN model to select the most suitable retaining wall technique. A limitation of rule-based systems is that they cannot present new assertions or solve problems outside their expertise domain. Yau and Yang [5] developed a CBR approach to overcome the limitations of ESs and NNs. An ES only covers limited knowledge and does not solve ambiguous problems precisely. An NN can only be numerically expressed when resolving problems and is difficult to understand because of the unknown training process; that is, it is a black box. Park and Kim [6] established an SVM model for forecasting the retaining wall technique. However, the SVM approach requires a high level of algorithmic complexity and an adequate kernel function [7].
Several studies on the use of AI for selecting the retaining wall technique have been conducted, but there are still limitations, in which the following techniques all require a large amount of data and complex analysis. SOM analysis is a method of visualizing data by mapping to low-dimensional maps, thus reducing the dimensions of multidimensional data. The maps can be analyzed using a relatively small amount of data. Chen, Yang, and Su [8] used the SOM model to compare particle swarm optimization to reduce the construction time of a secant pile wall. It has been proven that the SOM is suitable for providing the optimal construction sequence. Chen, Su, and Huang [9] also examined a new SOM algorithm to solve optimization problems related to the construction time required. Thus, the practicability of the SOM algorithm was verified. Recently, Pan, Sun, Turrin, Louter and Sariyildiz [10] developed a novel design method for a combined multilayer perceptron neural network (SOM-MLPNN). Although not for use in the field of construction, Gardner et al. [11] proposed an algorithm that was more advanced than the existing SOM to produce visualizations of hyperspectral data with improved accuracy. They developed a useful approach to save computational time and improve accuracy in data approximation and proposed a series of data visualizations to interpret the results and support design explorations in different ways. Rana et al. [12] proposed a clustering methodology using the SOM for network clustering of drinking-water distribution systems and proved that the method's performance in representing the observed measurements and demand estimate uncertainty improved. Rodríguez-Alarcón and Lozano [13] developed a decision support system using an SOM used by reservoir managers to model and visualize complex relationships between variables, and they used a vector quantization and clustering approach to achieve the research goal. Previously, it was impossible to analyze the inherent correlation between attributes. However, it is now possible to gain further insight based on the findings of recent studies. Therefore, in this study, we applied the SOM as a decision support method to forecast retaining wall techniques and examine the SOM applicability.

Self-Organizing Maps
The SOM (Kohonen 1990) is a machine-learning technique based on the human brain functioning, mainly relating to the behavior of neurons. The SOM creates a feature map by mapping multidimensional data to two-dimensional (2D) data. Because of its simple applicability, the SOM has been widely applied to optimization problems, robotics and control, function approximation, estimation, and evaluation [10,14,15].
The SOM network consists of two layers of neurons: an input layer (design variables) and an output layer (the model response), as shown in Figure 1. Thus, Figure 1 shows the concept of the subdivision of a 2D SOM into simplexes. The network composed of a 3 × 5 permutation matrix is divided into triangles of 16 simplexes. The main difference between the SOM and a multilayer feedforward network (MLP) is that the SOM is based on an unsupervised learning procedure, indicating that it does not require any target values. The input neurons maintain a complete bond with the output-layer neurons, and through unsupervised clustering training, they maintain a close union of similar patterns. The output layer is assigned to two levels for visual expression. Neurons on the output layer connect to the input neurons with significant strength, reflecting a vector m j (t) that matches the neuron in the input layer. The number of neurons can be adjusted by considering a specific problem. The neurons can also be arranged in a hexagonal form. The SOM performance of a hexagonal shape is expected to be higher compared to those of other shapes because more neighbors are modified in the hexagonal lattice than in a rectangular lattice [10].
SOM learning is performed in a winner-take-all manner, and it uses the Euclidian distance x − m j . A winning neuron j* that has the minimum distance is found to perform SOM learning: The connection weight of the winning neuron that satisfies Equation (1) and neighboring neurons can be modified as follows: where h j*j (t) is a topological neighborhood function used to determine a neighboring distance within which learning is designed to take place, and α(t) is a learning rate that represents the degree to which the weights are changed when two neurons are excited. The value is high when learning commences, but as the learning progresses, it becomes smaller. Generally, as expressed in Equation (3), a Gaussian function is used, and the value increases with time. When the entire process starts, neuron information is obtained in a broad area, but it is found in a more specific area as time increases: In Equation (3), d j*j is a topological distance from the winning neuron j*. SOM learning is performed in a winner-take-all manner, and it uses the Euclidian distance ∥x − mj∥. A winning neuron j* that has the minimum distance is found to perform SOM learning: The connection weight of the winning neuron that satisfies Equation (1) and neighboring neurons can be modified as follows: where hj*j(t) is a topological neighborhood function used to determine a neighboring distance within which learning is designed to take place, and α(t) is a learning rate that represents the degree to which the weights are changed when two neurons are excited. The value is high when learning commences, but as the learning progresses, it becomes smaller. Generally, as expressed in Equation (3), a Gaussian function is used, and the value increases with time. When the entire process starts, neuron information is obtained in a broad area, but it is found in a more specific area as time increases: hj*j(t) = exp{−dj*j/2δ 2 (t)} In Equation (3), dj*j is a topological distance from the winning neuron j*.
In an SOM, the learning process is repeated if the winning neuron and the neighboring neuron do not satisfy the specified termination criteria. The learning process can be repeated by predetermining the number of iterations.

Description of Research Questions
What types of data are used to forecast the most suitable retaining wall techniques? What process was selected for the variables used in developing the model? How are the data used for model development and validation? In an SOM, the learning process is repeated if the winning neuron and the neighboring neuron do not satisfy the specified termination criteria. The learning process can be repeated by predetermining the number of iterations.

Description of Research Questions
What types of data are used to forecast the most suitable retaining wall techniques? What process was selected for the variables used in developing the model? How are the data used for model development and validation?

Variables Selection
Apart from identifying all alternatives (in this case, the types of walls), it is necessary to identify all characteristics and frameworks of the project and the construction process and site conditions that can influence the selection of the most suitable techniques to apply the SOM model correctly. The variables used to select retaining wall techniques were determined in two steps ( Figure 2). In the first step, the variable parameters that influence the selection of the retaining wall techniques were selected by reviewing previous studies. In the second step, the selected variables were verified and evaluated by interviewing five well-experienced practitioners in the South Korean construction field. This resulted in choosing ten variables for selecting the retaining wall techniques determined by engineers. The survey targeted experts in geotechnical engineering, and their average work experience was approximately 13.4 years. The respondents of the survey assessed the impact of each independent variable on the forecasting of the retaining wall technique. the selection of the retaining wall techniques were selected by reviewing previous studies. In the second step, the selected variables were verified and evaluated by interviewing five well-experienced practitioners in the South Korean construction field. This resulted in choosing ten variables for selecting the retaining wall techniques determined by engineers. The survey targeted experts in geotechnical engineering, and their average work experience was approximately 13.4 years. The respondents of the survey assessed the impact of each independent variable on the forecasting of the retaining wall technique.

Data Description
Data from 129 excavation project cases without missing values were collected from building construction companies in large South Korean cities. Based on previous research [5] (see also Yau et al. 1999), a classification of the retaining wall technique in South Korea is presented ( Table 1). The retaining wall techniques are divided into six groups: SW, SCW, CIP, JetGr, LWGr, and HSCW, which are generally used on site. Ten factors were determined to select a retaining wall technique designed by engineers before the SOM model was developed. Table 2 lists the outcome of the determinants, and Table 3 lists some parts of the raw data. The shape of the site, the only nominal factor, was divided into three types: tetragonal, polygonal, and indeterminate.

Data Construction
Data for 129 building projects were provided by well-known companies in South Korea. The H_Pile (H-section steel pile and laggings) data were excluded because it was the smallest case. The data used to develop the SOM were divided randomly into two parts in the ratio of 80% (training part) to 20% (test part). This ratio was used for the five-fold cross-validation and evaluation of the performance of the SOM model. For example, the training set comprised SW (12 cases), SCW (8 cases), CIP (7 cases), JetGr (35 cases), LWGr (9 cases), and HSCW (31 cases), and all cases were allocated according to the number of data descriptions (see Table 4).

Case Study
Mathworks software (MATLAB ® ) and SOM Toolbox version 2.0 were used in this study to develop the SOM model because this approach has been validated in previous studies [16]. The application procedure of the SOM algorithm is depicted in Figure 3. The parameters, such as the number of output layer neurons and the neighborhood determination function radius, were determined to apply the SOM. In this study, the 2D plane map structure of the output layer had a hexagonal shape, and the number of output layer neurons was 300. A Gaussian function was applied to determine the neighborhood radius. Moreover, the application of the SOM when the property values are different can affect the resulting value. The attribute values of the factors of the retaining wall technique example were different (Table 2). Therefore, the SOM was applied to each variable to be normalized. The method was applied to normalize the maximum and minimum values within the range of "1" and "0". The process was repeated continually, and the optimization function value converged within a particular range or was forcibly terminated after a predetermined number of iterations. The SOM used in this study was developed by Kohonen [17].
The SOM was used to map the attribute value from a multidimensional vector to a 2D plane, and the relationship between attributes was confirmed visually. Figure 4 shows a mapping state of the 2D plane for each attribute after training for all 129 training data cases. High attribute values are displayed in red, and lower values are displayed in blue. The map at the bottom-right corner of Figure 4 shows the position of the retaining wall technique on the map of the 2D plane. The SW method was located on the top-left corner of the map. In each feature of the map, the relationship of visual grasp between each attribute was identified based on the color of the portion corresponding to this area. In the feature map (i.e., the map located in the middle of the left side, labeled as "level" of the attribute name) of the underground excavation depth, this area corresponds to the red area, which is applied at the extended drilling depth. This area is indicated in red on the feature map (map located on the middle of the right side, labeled as "water level") of the groundwater level. Thus, the SOM, unlike other algorithms, has the advantage of visually representing the relationship among attributes. Figure 5 shows the position where the respective data occupy the feature map when the test data are applied after training the SOM using training data. The SOM indicates the positions at which the test data are associated with the areas corresponding to different retaining wall techniques. The SOM could confirm the area size mapped to a 2D plane of the output layer, depending on the relative number of collected data changes ( Figure 5). In other words, the class that has the number of case data is assigned to relatively wider areas.  The SOM was used to map the attribute value from a multidimensional vector to a 2D plane, and the relationship between attributes was confirmed visually. Figure 4 shows a mapping state of the 2D plane for each attribute after training for all 129 training data cases. High attribute values are displayed in red, and lower values are displayed in blue. The map at the bottom-right corner of Figure 4 shows the position of the retaining wall technique on the map of the 2D plane. The SW method was located on the top-left corner of the map. In each feature of the map, the relationship of visual grasp between each attribute was identified based on the color of the portion corresponding to this area. In the feature map (i.e., the map located in the middle of the left side, labeled as "level" of the attribute name) of the underground excavation depth, this area corresponds to the red area, which is applied at the extended drilling depth. This area is indicated in red on the feature map (map located on the middle of the right side, labeled as "water level") of the groundwater level. Thus, the SOM, unlike other algorithms, has the advantage of visually representing the relationship among attributes.  Figure 5 shows the position where the respective data occupy the feature map when the test data are applied after training the SOM using training data. The SOM indicates the positions at which the test data are associated with the areas corresponding to different retaining wall techniques. The SOM could confirm the area size mapped to a 2D plane of the output layer, depending on the relative number of collected data changes ( Figure  5). In other words, the class that has the number of case data is assigned to relatively wider areas.
3.12 × 10 −17 N-fold cross-validation was adapted to evaluate the performance of the SOM model and validate the reliability of the results. Five-fold cross-validation indicates that the nvalue of five appears to be the optimal number of times. The SOM model was trained and tested five times. One remained for testing purposes. Table 5 lists the number of specific N-fold cross-validation was adapted to evaluate the performance of the SOM model and validate the reliability of the results. Five-fold cross-validation indicates that the n-value of five appears to be the optimal number of times. The SOM model was trained and tested five times. One remained for testing purposes. Table 5 lists the number of specific data errors for each of the retaining wall technique used. When the number of data elements was small, such as in the SCW, CIP, and LWGr cases, we observed relatively few small errors. The amount of data collected for the SOM did not significantly influence the accuracy of the results. The results showed a maximum accuracy of 81.5% and a minimum accuracy of 66.7%, with an average value of 79.8%. The average accuracy values were not very high in the respective trials. Figure 6 shows the mapping state of the 2D plane for each attribute after testing for one-, two-, and five-fold validations using all data cases. Each hexagon unit at a specific position on a plane is the same as that at the unit map position. The other components are represented in colors using the scale to the right of the clustering map that displays the values of each component.  Our application of the SOM to 129 cases resulted in the following findings. First, the SOM showed that the mapped area size on a 2D plane of the output layer changes with the relative number of collected data in terms of accuracy. This study confirmed that the error factor is not high, even when the amount of data is relatively small. Therefore, the application of the SOM is useful when the collected data are insufficient. Second, the SOM can be used to identify the correlation not only between classes and properties but also between different properties visually. Therefore, the change in the land area, excavation depth, and groundwater level with the retaining wall technique can be confirmed visually. Additionally, the results can be easily interpreted to understand the features of the Our application of the SOM to 129 cases resulted in the following findings. First, the SOM showed that the mapped area size on a 2D plane of the output layer changes with the relative number of collected data in terms of accuracy. This study confirmed that the error factor is not high, even when the amount of data is relatively small. Therefore, the application of the SOM is useful when the collected data are insufficient. Second, the SOM can be used to identify the correlation not only between classes and properties but also between different properties visually. Therefore, the change in the land area, excavation depth, and groundwater level with the retaining wall technique can be confirmed visually. Additionally, the results can be easily interpreted to understand the features of the retaining wall technique based on different cases. Third, the determination of the best model was difficult because the SOM was not directly compared with other existing models, such as the NN, CBR, and SVM. However, the results can be used for decision-making while selecting a retaining wall technique when the SOM is applied after the strengths and weaknesses of the model are identified, as analyzed in this study; in this case, the maximum accuracy of the SOM is 81.5%, and the average accuracy is 79.8%.

Summary and Contributions
In this study, an SOM model was reviewed and developed as a decision-making tool to select a retaining wall technique. This approach is currently gaining acceptance for pattern classification. After literature reviews, the accuracy of the method selection and the characteristics of each classifier were analyzed using the collected data. According to the analysis results, the SOM model showed high classification accuracy, even when the data amount was relatively small. Identifying the visual relationship between properties was easy because the model showed a feature map that mapped the attribute values of a multidimensional vector to a 2D plane.
The main contribution of this study is the development of a novel forecasting method based on the SOM for selecting retaining wall techniques, which facilitates the visualization of multidimensional data on a two-dimensional plane. In the proposed method, as demonstrated in this study (Figure 3), the SOM involves three stages of workflows: data construction, SOM optimization, and classification. Based on the case study results, the inherent correlation among various variables can be considered for forecasting retaining wall techniques. Data visualization is effective in intuitive understanding and analysis. Hence, unsupervised SOM produced a multicolor similarity map of the analysis area, in which pixels with similar mass spectra are assigned similar colors. This assignment indicates that the result is a color scheme that accurately reflects local spectral distances between pixels in the data. In addition, the classification accuracy is verified to be relatively high. Therefore, the SOM technique can be fully utilized in decision-making for forecasting retaining wall techniques.

Limitations and Future Work
In this study, existing models, such as the NN, CBR, and SVM, were not compared directly. In particular, the SOM is distinct from ANNs, which are considered reliable and low-cost techniques for data interpretation and prediction. Therefore, it is necessary to make comparisons using the same data to overcome the current limitation of the results. However, the SOM can be fully utilized as a decision support tool for selecting a retaining wall technique because it does not show a significant difference in accuracy with respect to values reported previously. In addition, a comprehensive comparative analysis of each model is required in future studies using sufficient data. The solution is conditioned by a series of criteria, projects, and construction requirements, although not all criteria have the same importance in influencing the decision. Therefore, future studies must focus on analyzing each case, depending on the weight of each criterion. Furthermore, the optimum value of many SOM variables should be determined to obtain better SOM results.