Implementation of Machine Learning Algorithms in Spectral Analysis of Surface Waves (SASW) Inversion

Mitu, Sadia Mannan; Rahman, Norinah Abd.; Nayan, Khairul Anuar Mohd; Zulkifley, Mohd Asyraf; Rosyidi, Sri Atmaja P.

doi:10.3390/app11062557

Open AccessArticle

Implementation of Machine Learning Algorithms in Spectral Analysis of Surface Waves (SASW) Inversion

by

Sadia Mannan Mitu

¹,

Norinah Abd. Rahman

^1,*,

Khairul Anuar Mohd Nayan

²,

Mohd Asyraf Zulkifley

¹

and

Sri Atmaja P. Rosyidi

³

¹

Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi UKM 43600, Malaysia

²

Virtual Instrument & System Innovation Sdn Bhd, Petaling Jaya 47301, Malaysia

³

Department of Civil Engineering, Universitas Muhammadiyah Yogyakarta, Jalan Brawijaya (Lingkar Selatan), Bantul, Yogyakarta 55183, Indonesia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(6), 2557; https://doi.org/10.3390/app11062557

Submission received: 22 February 2021 / Revised: 6 March 2021 / Accepted: 7 March 2021 / Published: 12 March 2021

(This article belongs to the Special Issue Artificial Neural Networks Applied in Civil Engineering)

Download

Browse Figures

Versions Notes

Abstract

One of the complex processes in spectral analysis of surface waves (SASW) data analysis is the inversion procedure. An initial soil profile needs to be assumed at the beginning of the inversion analysis, which involves calculating the theoretical dispersion curve. If the assumption of the starting soil profile model is not reasonably close, the iteration process might lead to nonconvergence or take too long to be converged. Automating the inversion procedure will allow us to evaluate the soil stiffness properties conveniently and rapidly by means of the SASW method. Multilayer perceptron (MLP), random forest (RF), support vector regression (SVR), and linear regression (LR) algorithms were implemented in order to automate the inversion. For this purpose, the dispersion curves obtained from 50 field tests were used as input data for all of the algorithms. The results illustrated that SVR algorithms could potentially be used to estimate the shear wave velocity of soil.

Keywords:

spectral analysis of surface wave; inversion; automation; machine learning

1. Introduction

The spectral analysis of surface waves (SASW) method is a nondestructive technique used to determine the shear wave velocity of layered material by using the theory of stress wave propagation. This method employs the dispersive characteristics of Rayleigh waves to determine the variation of stiffness with depth. Rayleigh waves propagate along a cylindrical wavefront near the surface of a half-space, and the amplitude of particle motion decays exponentially with depth [1]. Rayleigh waves disseminate two thirds of energy for a vertical load in a homogeneous isotropic half-space [2]. Rayleigh wave phase velocity primarily depends on the material properties, specifically shear wave velocity, compression wave velocity, Poisson’s ratio, and mass density to a depth of one wavelength, as shown in Figure 1. In geotechnical engineering, the SASW method has been used extensively to determine the various parameters of soil, such as bearing capacity [3,4,5], small stiffness [6], and pavement subgrade stiffness [7].

One of the complex processes in SASW data analysis is the inversion procedure. Studies related to the inversion procedure can go in many directions due to the procedure’s complexness, which includes generation of the theoretical dispersion curve method, simplification of the procedure, problems with multi modes of the dispersion curve, and nonuniqueness of the solution. Nazarian [9] used a modification of the Haskell–Thomson technique [10,11] to generate the theoretical dispersion curve. He employed the INVERT program, which is computer software designed for assuming the initial stiffness profile reported by Ballard [12], to compare the shear wave velocities directly. The differences between the shear wave velocity profiles from Ballard’s results and those from the Nazarian study were quite drastic. Therefore, the simplified inversion process based upon scaling the dispersion curve employed by many researchers is not suitable. Hossain and Drnevich [13] found some limitations in Nazarian’s technique. The method used to find the roots of the characteristics determinant for a layered system is quite complicated. Another limitation is the trial-and-error process of matching the theoretical and experimental dispersion curves. Hossain and Drnevich applied two methods, using the FORTRAN 77 program to determine the pavement moduli and thicknesses to overcome these difficulties, while they later developed a finite difference technique to analyze the generalized Rayleigh waves in multilayered elastic media. This method is very significant when it comes to determining the theoretical dispersion curve for a pavement system, which leads to a quadratic eigenvalue problem for the pavement system and a linear eigenvalue problem for the geologic model. The second technique comprised using a modification of Knopoff’s [14] algorithm to determine the theoretical dispersion curve. Later on, Yuan and Nazarian [15] generated an algorithm based on the energy integral equation to overcome this problem. Moreover, Addo and Robertson [16] developed a unique SASW system requiring no spectrum analyzer. Microcomputerization was used to conduct the entire procedure, and is able to determine the in situ dispersion curve. A program called “SASWFM” was written to automate the inversion process. The shear wave velocity can be obtained directly from the experimental dispersion curve. The results were compared to Seismic Cone Penetration Testing (SCPT) tests and reflected a relatively poor match between the experimental curve and the theoretical dispersion curve in some of the sites. Overall, the microcomputer-based system reduces the time for testing and the cost of equipment. To overcome the difficulties associated with the presence of multiple modes in SASW signals, Zomorodian and Hunaidi [17] introduced a new inversion method based on the maximum vertical flexibility. Meier and Rix [18] proposed the back-calculation neural network to replace the intensive computation of trial and error and the least-squares inversion method. Studies related to SASW inversion are summarized in Table 1.

The complexity of the SASW method in terms of data reduction has encouraged researchers to find a much more straightforward alternative without relinquishing reliability. Another motivation for much simpler analysis is to make SASW available for all users, be they experienced or beginners. Recently, a guideline for nonexpert users in surface wave acquisition and analysis has been published by Foti et al. [19]. The guideline presented a wide range of concerns related to the surface wave technique, including the nonuniqueness problem of SASW solutions.

In conventional SASW inversion analysis, a soil profile needs to be assumed for the inversion process, which involves calculating the theoretical dispersion curve by means of forward modeling [24]. A misfit between the experimental curve and the theoretical dispersion curve is then iteratively and automatically minimized to a predefined small value. If the assumption of the starting soil profile model is not reasonably close, the iteration process might lead to nonconvergence or take too long to be converged. Therefore, each step in the inversion process requires careful execution. Expert user opinions are often necessary in order to dispose of uncertain interpretation of the shear wave velocity profile and define the criteria for a final profile. To improve the complexity of the inversion process, some machine learning (ML) algorithms were proposed in this study. The main objective of the present study is to accelerate and simplify the inversion process by training the selected ML algorithms to ‘understand’ how conventional SASW inversion comes out with a shear wave velocity profile from the corresponding experimental dispersion curve, as illustrated in Figure 2.

An artificial neural network (ANN) is a popular ML model which has been adopted in recent years to automate the SASW inversion method [23,25,26,27]. Williams and Gucunski [23] determined the moduli and thicknesses of a four-layered pavement system through the use of an ANN. Later, Gucunski et al. [27] improved the models to develop five ANN models, with each model determining one single property of the pavement system. Shirazi et al. [25] developed a number of different ANN models to automate the SASW method in pavements. They used several points of the dispersion curve as input and the thicknesses and the elastic moduli of the layers as output. Three ANN software packages (STATISTICA, ANN toolbox by MATLAB, and NeuralSIM) were employed and the models could generate reasonably close elastic modulus for the upper layers. Alimoradi et al. [26] used the results of the SASW tests to train ANN for the classification of shear wave velocity. They conducted nine SASW methods and Downhole Tests (DHT) and determined the unknown nonlinear relationships between SASW results and those obtained by means of the method of DHT, the latter of which have been treated as real values. The results show that the backpropagation neural network could predict the shear wave velocity between wells accurately. Therefore, due to the performance of ANN in SASW inversion by previous researchers, an ANN model will be adopted in this study and compared to other ML algorithms.

2. Inversion Analysis

The conventional inversion method used in the present study is described in this section. Determination of the Starting Model Parameter (SMP), depth resolution analysis, layer sensitivity analysis, and final inversion analysis are the major steps in the inversion procedure, as shown in Figure 2 (on the right-hand side of the flow diagram). Firstly, the SMP was determined based on the theoretical dispersion curve and the assumption of the parameters. An assumption of the number of layers, as well as of each layer’s thickness, was made for the iterative procedure [15]. Subsequently, depth resolution analysis and layer sensitivity analysis refined the starting model in the first step. Several iterations are required for the steps of refining the starting model. These three steps constitute the preliminary inversion analysis. After completing the preliminary inversion analysis, the final inversion analysis, which has been denoted as Inversion Engine in the flow diagram, can be performed.

The starting model is called a preliminary shear wave velocity profile, since shear wave velocity is used as a model parameter for the inversion procedure. There are two steps involved in constructing the preliminary shear wave velocity profile. A preliminary shear wave velocity profile is determined as the first step. Based on the experimental dispersion data distribution, layer thicknesses and the number of layers are calculated for the preliminary profile. In the second step, another profile is determined from the preliminary profile based on the layering determined in the first step. The second profile is used for the inversion analysis.

The measurement of depth resolution evaluates the optimum resolvable depth for a given experimental dispersion curve [24]. The deepest resolvable layer is dependent on the penetration depth or the zone stressed by the surface wave. The penetration depth is influenced by the frequencies generated by the source on the surface and the stiffness structure of the subsurface. Inversion analysis is very reliant on the layering of the starting model. Layering of the subsurface structure is assumed initially for the inversion technique. Consequently, a measure must be provided so as to indicate how good the assumed layering is. Layer sensitivity analysis decides whether or not the assumed layering is suitable. The assumed layering could be improved to provide a proper resolution, depending on the layer sensitivity analysis [24].

The inversion engine, as illustrated in Figure 2, represents the principal flow of the inversion analysis. This engine has two components: one calculates root mean square error (RMSE), and the other updates the model parameters. The parameters include shear wave velocity, Poisson’s ratio, compression wave velocity, mass density, and the material damping ratio [28]. Wave velocities are updated for every iteration; however, the parameters, such as Poisson’s ratio, mass density, and the material damping ratio, are reasonably assumed, since their effect on phase velocity is considerably small. Compression wave velocity is explicitly related to shear wave velocity and Poisson’s ratio. Thus, it can be calculated using the assumed Poisson’s ratio and the updated shear wave velocity. Therefore, only shear wave velocity was used as the modal parameter for the inversion analysis. The sensitivity matrix, G, and the change in a model parameter, ∆

m

, are determined to update the model parameters.

After completing the preliminary inversion analysis, the final inversion analysis can be performed. Iteration of the inversion analysis is continued until the theoretical dispersion curve is similar enough to the experimental dispersion curve [29]. RMSE_S measures the misfit between the theoretical curve and the experimental dispersion curve and evaluates the goodness of the shear wave velocity profile. Subscript ‘S’ is added to the abbreviation of RMSE to distinguish the misfit in dispersion curves from RMSE in ML training. Lower RMSE_S are considered the best match between the theoretical and experimental dispersion curves [30]. Since iteration is incorporated into the inversion analysis, there must be a measure with which to quantify the goodness of the estimated model parameters and the predicted shear wave velocity. The prediction error can be defined in several ways, but the maximum likelihood approach for the inversion analysis justifies the use of the L₂ norm, which is a loss function that reflects least-squares errors, as defined in Equation (1). The maximum likelihood approach is one of the procedures used to find the optimum shear wave velocity for an experimental dispersion curve:

L_{2} n o r m : ‖ e ‖_{2} = \sqrt{\sum_{i} {|e_{i}|}^{2}}

(1)

where

e

_i is the difference between the observed data and the shear wave velocity. The RMSE_s in Equation (2) is a representation of the

L

₂ norm. The RMSE_s is independent of the number of datasets and provides the absolute sense of the average misfit:

R M S E_{s} = \sqrt{\frac{1}{n} \sum_{i} {|e_{i}|}^{2}}

(2)

where

n

is the number of datasets and

i

represents the variables.

Evaluation of a stiffness profile requires an experimentally determined phase velocity dispersion curve which is produced via iterative forward modeling analysis or inversion analysis [15,31]. This inherent problem of inversion analysis requires several tries of an initial guess or a reasonable initial guess. To simplify this inversion process, the experimental dispersion curve and the final shear wave velocity were used to train the ML algorithm, as indicated in Figure 2.

3. Machine Learning (ML) Algorithms

Machine learning (ML) is an application of artificial intelligence which is able to perform tasks by observing several previous examples and information without being programmed. To predict the shear wave velocity profile from the experimental dispersion curve, an ANN known as MLP was adopted in this study. The predictive performance of the ANN is then compared with other ML algorithms, namely RF, support vector machine (SVM), and LR. An overview of each algorithm, such as their unique features and advantages that motivate their use in this study, is put forth in the present section. Since the dataset from the SASW inversion consists of a set of inputs and outputs, all algorithms chosen are supposed to be supervised ML algorithms. The workflow of a supervised ML model is shown in Figure 3.

3.1. Multilayer Perceptron (MLP)

Multilayer perceptron (MLP) is a class of feed-forward ANN. The structure of MLP includes an input layer, an output layer and a hidden layer. This algorithm learns a function,

f (\cdot) : R_{n} \to R_{o}

, by training on a dataset, wherein

n

is the number of inputs and o is the number of outputs. There can be several hidden layers in the MLP algorithm. Figure 4 shows the MLP network with one hidden layer.

The set of neurons

{x_{i} | x_{1}, x_{2}, \dots, x_{n}}

is the representation of the input layer, which is raw data collected from the field. The hidden layer transforms each neuron from the input layer with a weighted linear summation

w

₁

x

₁

+ w

₂

x

₂

+ \dots + w

_m

x

_m, followed by a nonlinear activation function

g (\cdot) : R \to R

, such as the hyperbolic tan function. Thereafter, the output receives the values from the last hidden layer.

3.2. Random Forest (RF)

Random forest (RF) is an effective model for predictive analysis, as it uses ensemble learning methods for classification and regression [32]. However, fine-tuning of its hyperparameters is required by optimization algorithms for excellent RF modeling. This algorithm has been employed for solving geotechnical engineering problems by, for example, assessing pile drivability [33], the undrained shear strength of soft clays [34], and liquefaction potential [35]. RF regression is based on decision trees for creating predictive models. Each tree is trained on a randomly sampled subset of the input data. If a training dataset is used as input with targets and features in the decision tree, the algorithm will formulate some set of rules. Each tree works individually, with no interaction happening between them. The RF model splits the nodes in each tree, considering a limited number of features, and combines hundreds or thousands of decision trees. Thereafter, the results are predicted based on the average predictive value of each tree. Figure 5 is a representation of the RF decision trees.

3.3. Support Vector Machine (SVM)

The support vector machine (SVM) is a supervised ML algorithm that predicts a model significantly with less computational power [36]. It can be used for both regression and classification tasks [37]. SVR is derived from the SVM and is a prediction method that utilizes the principle of ML to optimize prediction precision while preventing overfitting the data automatically. Rather than the conventional empirical risk minimization (ERM) principle, the SVM is trained using the systemic risk minimization (SRM) principle [38]. Only the errors associated with the training dataset are minimized by means of ERM, but SRM simultaneously minimizes the empirical risk and model complexity. In other words, a tradeoff between the consistency of the approximation of the model and the sophistication of the approximation function is implied by the SRM principle. SVR thus prefers smooth models that do not overfit the training data—a requirement for successful unseen (testing) data generalization capabilities.

The principles of SVR are similar to the SVM for classification, but there are a few differences between them. It requires a set of inputs and corresponding outputs to predict the model. The SVM finds a hyperplane in some features which separates the data points into two classes [39]. Many hyperplanes could be chosen to separate the data points. This choice can be made by observing the maximum margin distance between the data points of both classes. Samples on the margin are called support vectors. SVR finds a function

f (x)

that has the most deviated margin of tolerance

(ε)

from the actual target value

y

for all of the training data. The linear function is defined as Equation (3):

f (x) = (w, x) + b

(3)

where x is used to estimate the scaler vector of

y

by means of the n-dimensional weighting coefficient

w

, and the constant coefficient is

b

. The margin of tolerance can be minimized and calculated as:

m i n i m i z e \frac{1}{2} ‖ w ‖^{2} + C \sum_{i = 1}^{n} |ε_{i}|

(4)

It is assumed that the function

f

estimates all of the pairs

(x_{i}, y_{i})

with precision through the use of Equation (4). If the data cannot be placed into the margin, the slack variables

ξ_{i}

can be used to solve the problem via Equation (5):

|y_{i} - w_{i} x_{i}| \leq ε + |ξ_{i}|

(5)

Figure 6 graphically depicts the linear SVR. Generally, four types of kernels are used in the SVR models, namely linear, polynomial, radial basis function (RBF), and sigmoid. The mathematical representation of each kernel is given in Equation (6) [40]:

K (X_{i}, X_{j}) = \{\begin{matrix} X_{i} X_{j} L i n e a r \\ {(γ X_{i} X_{j} + C)}^{d} P o l y n o m i a l \\ \exp (- γ {|X_{i} - X_{j}|}^{2}) R B F \\ t a n h (γ X_{i} X_{j} + C) S i g m o i d \end{matrix}

(6)

where

K (X_{i}, X_{j})

represents the kernel functions and

γ

and

d

denote the kernel width and the power exponent, respectively. Here,

C

is the kernel parameter. In this study, the RBF was used. The RBF is the most popular kernel type used in SVR because of its localized and finite responses across the entire range of the real x-axis [41].

3.4. Linear Regression (LR)

Linear Regression (LR) is the simplest ML algorithm that performs only regression tasks. LR predicts a dependent variable value based on a given independent variable. The algorithm creates a linear relationship between input and output. The model finds the best line of fit based on the minimum error between the predicted and observed values. The hypothesis function for LR is given in Equation (7):

Y = θ_{0} + θ_{1} x_{1} + θ_{2} x_{2} + \dots + θ_{n} x_{n}

(7)

where Y is the predicted value, θ₀ is the bias term, θ₁, …, θ_n are the model parameters, and x₁, x₂, …, x_n are the feature values. The aforementioned hypothesis can also be represented by Equations (8) and (9):

Y = θ^{T} x

(8)

f_{i} (θ) = θ^{T} x + \in_{i}

(9)

where

f_{i} (θ)

is the hypothesis function and

\in_{i}

is noise. An intercept term is added by appending a column of 1 s to the features. Regularization is often required in order to prevent overfitting by penalizing models with extreme parameter values. The LR model supports L₁ and L₂ regularization, which are added to the loss function [42]. The L₁ norm can be determined from the sum of the absolute values of the vector, and the L₂ norm can be obtained as the square root of the sum of the squared vector values that are used to minimize the error. One of the main mysteries revealed by deep learning methodology is the concept of benign overfitting: deep neural networks tend to predict well, even with a perfect match for noisy training results. This algorithm was used to predict shear wave velocity at 30 m by means of a 95% confidence interval [43]. Therefore, LR is considered in this study, albeit a simple ML algorithm.

3.5. K-Fold Cross-Validation

The primary goal of cross-validation is to avoid the bias of an algorithm in training and testing data selection. The process estimates the performance of ML algorithms based on unseen data or the data not in use during the training. It is called k-fold cross-validation. The parameter k refers to the number of datasets into which a given data sample is to be split, as shown in Figure 7. Each dataset is split into 80% for training and 20% for testing purposes.

4. Methods and Materials

In the SASW method, Rayleigh waves are generated by any impact source and detected by two receivers before then being recorded by a spectrum analyzer, as shown in Figure 8. Rayleigh wave data were collected and transformed into the frequency domain through the use of a dynamic signal analyzer [44].

The seismic experiments (SASW) were carried out at 50 different locations in Peninsular Malaysia. SASW data are from a broad collection of geological and geotechnical site conditions ranging from recent alluvium, granitic to metasedimentary formations with geotechnical consistency of soft/loose to hard/dense soil conditions. The dispersion curves of all 50 corresponding sites can be found in the appendices. During the tests, both geophones were placed 0.5 m away from the specific points so as to maintain a 1 m distance between said geophones.

A rubber hammer, a geological hammer, a sledgehammer and a 10 kg weight were used to hit the soil surface in order to produce a transient vertical impact. Sources were placed at 1, 2, 4, 8, and 10 m from the first geophone, as shown in Figure 9. Thereafter, the data obtained from the field were analyzed using WinSASW 3.2.12. The depth was taken up to 6 m in dataset preparation for ML algorithm training and testing. The range of wavelengths of the dispersion curve needs to include the wavelengths for sampling shallow material and be long enough to penetrate deep layers. One source and receiver’s setup are not sufficient to determine phase velocities over a wide range of wavelengths, which is required to evaluate a soil site with a depth of, for example, 30 ft (9 m) to 60 ft (18.3 m). Therefore, several measurement setups comprising different receiver spacings are used for this purpose.

Figure 10 illustrates the summary of the SASW data analysis and the implementation of the ML through the use of the SASW method. The first task in analyzing the field measurements is the construction of an experimental dispersion curve. It is important to unwrap the phase spectrum in order to accomplish this task. An interpretation of the phase spectrum is required so as to unwrap the phase spectrum. The technique is called masking for interpreting the phase spectrum. The unwrapped phase spectrum and the receiver spacing determine the experimental dispersion curve by means of Equation (10):

V_{P H} = f . λ

V_{P H} = f . \frac{d}{φ / 360 °}

(10)

where

φ

is the phase difference between the two receivers for the wave traveling with the frequency of

f

, and

d

is the distance between the two receivers.

The material properties and thickness of the soil profiles were assumed to obtain the theoretical dispersion curve. The soil layers were assumed to comprise 12 layers of 0.5 m thickness, a bulk density of 1800 kg m⁻³, a Poisson’s ratio of 0.3333, and a damping ratio of 0.02. The theoretical dispersion curve was generated based on this assumed parameter. Thereafter, the theoretical dispersion curve and the experimental dispersion curve obtained from the field were compared.

The triangular part of Figure 10 is explained in detail in Figure 11. The parameters considered in the development of the ML were the shear wave velocity

(V

_S

)

, thickness, wavelength, and phase velocity of each layer. The theoretical dispersion curves from 50 tests were taken as input. A total number of 4782 data points was found from the 50 dispersion curves. Subsequently, all of the measurements obtained from the dispersion curve were divided into 12 layers of 6 m depth at a 0.5 m interval, as shown in Figure 11. The wavelengths were converted to an equivalent depth of

\frac{1}{3} λ_{R}

. For every 0.5 m of depth, D, the wavelength,

λ_{R}

, was calculated using Equation (11):

D = \frac{1}{3} λ_{R}

(11)

As shown in Figure 11, the wavelength starting from 1.5–3 m belongs to 0.5–1 m layers of depth. Similarly, the wavelength starting from 3–4.5 m belongs to 1–1.5 m layers of depth, and 16.5–18 m of the wavelength belongs to 5.5–6 m of depth. Afterwards, all of the dispersion curves from the 50 SASW tests were combined and divided into these 12 layers at a 0.5 m interval. For training the datasets, the wavelength and the phase velocity were considered to be input corresponding to each layer. For instance, the wavelengths from 0–1.5 m were taken as input, as well as the shear wave velocities at 0–0.5 m depth. The theoretical dispersion curves, along with the soil profile from the database, were used to train the ML algorithms. The proportion of the training and testing datasets was 80% and 20%, respectively, in the ML algorithms. The contention was to generate the most extensive, theoretically possible input range.

The number of observations obtained from the dispersion curve at different layers is shown in Table 2. The deepest resolvable layer is related to the penetration depth of the surface wave. The penetration depth is controlled by the frequencies generated by the source on the surface and by the stiffness structure of the subsurface. The surface wave needs to propagate deeper with the right range of low frequencies in order to be able to capture the deep layer. Generally, hammer impact sources have a limitation in low-frequency generation in comparison to higher-frequency generation. Therefore, the data points obtained from the field decrease with deeper layers. To evaluate the

V

_S profile from the SASW measurements, dispersion curves must be determined.

To measure the performance of the model, the coefficient of determination (R²) and RMSE were used. Equation (12) is employed to determine the RMSE in the ML algorithms:

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} (P_{i} - A_{i})}{N}}

(12)

where

P_{i}

is the predicted shear wave velocity,

A_{i}

is the shear wave velocity from conventional SASW inversion, and N is the sample size.

The following parameters in Table 3 were used in ML for the algorithms.

5. Results and Discussions

5.1. Distribution of Datasets

The most important step is to determine the underlying distribution of data before applying any ML approach. In an ML context, the input data for processing comprise only numerical values. A histogram is a visual representation of the dataset distribution that shows any outliers or gaps in the data. ML performance depends on the distribution of the datasets. If the data distribution is not normal, then it can be skewed to the left or right or be completely random. Whether or not the shear wave velocity obtained from the field is normally distributed is discussed below.

After analyzing Figure 12a–l, it can be concluded that all of the datasets after 3 m are positively skewed. Moreover, the layers are contaminated with the outliers found after 3 m of depth. Another observation from Table 2 is that the number of observations decreases with the increase of depth. However, a wide range of data points with the increase of depth can be observed in the histogram. Figure 12a illustrates the histogram of V_S of 0–0.5 m depth. The histogram has a maximum value of 116.79 ms⁻¹ and a minimum value of 0 ms⁻¹. The frequency is zero within 81.36–90.21 ms⁻¹ and 214.22–240.79 ms⁻¹. Figure 12b illustrates the histogram of V_S. at 0.5–1 m depth, and the majority of the values of shear wave velocity lie in the range of 120.47–250.67 ms⁻¹. The frequency is zero between 258.81 and 275.09 ms⁻¹. The maximum value of the third histogram (Figure 12c) is 173.71 ms⁻¹, with the lowest being 0 ms⁻¹. Two outliers at 12.06 and 402.7 ms⁻¹ can be observed in the datasets. Besides, five nonfrequency ranges, from 12.06–92.89, 106.35–133.29, 200.65–214.12, 254.53–267.99, and 294.93–389.23 ms⁻¹, can be found in the datasets. Figure 12d is the histogram of V_S. at 1.5–2 m depth. This data can be defined as being normally distributed. The majority of the values of shear wave velocity lie in the range of 155.66–285.45 ms⁻¹. No outliers can be observed in the dataset. The histogram of the V_S. at 2–2.5 m depth is depicted in Figure 12e. It has one peak at 219.93 ms⁻¹ and two gaps (zero frequency) within 86.47–136.52 and 336.7–353.38 ms⁻¹. The histogram at 2–2.5 m depth follows a similar pattern to the fourth histogram. Figure 12f shows the histogram of the V_S. at 2.5–3 m depth. There is one outlier at 459.79 ms⁻¹ of V_S.

The histogram of the V_S at 3–3.5 m depth is emphasized in Figure 12g. These datasets are right or positively skewed. They also contain a significant number of outliers. The majority of the data can be found in the 206.65–251.24 ms⁻¹ range, and the outliers are found from 429.6 to 697.14 ms⁻¹. The histogram of V_S at 3.5–4 m depth shows a similar pattern to the histogram at 3–3.5 m. The distribution of the datasets is also right skewed, as shown in Figure 12h. These datasets are contaminated with outliers detected from 322.59–697.14 ms⁻¹. Figure 12i,j also depicts that the histogram of V_S at 4–4.5 m and that of 4.5–5 m layers of depth are right skewed. Moreover, the outliers were detected from 421.46–799.44 ms⁻¹ at 4–4.5 m and 455.36–875.71 at 4.5–5 m depth. Figure 12k,l illustrates that the V_S histograms at 5–5.5 and 5.5–6 m layers of depth are right or positively skewed. Both layers are highly contaminated with the outliers detected from 857.49–1541.83 ms⁻¹ at 5–5.5 m depth and 512.79–1361.85 ms⁻¹ at 5.5–6 m depth. Therefore, it can be observed that the range of the V_S started to increase from 3 m, going from 72.88–697.14 ms⁻¹ at the seventh layer (3–3.5 m) and eighth layer (3.5–4 m). The range varies from 119.07–799.44 ms⁻¹ at the ninth (4–4.5 m) layer and 100–900 ms⁻¹ at the tenth (4.5–5 m) layer, while it goes from 173.17 to 1541.83 ms⁻¹ for the last two layers (5–6 m), even though the number of observations is too low, at 77 and 60, respectively, in these layers (refer to Table 2). Real-time data are never usually normally distributed. In skewed data, the tail region may act as an outlier for a statistical model. Moreover, the outliers have an adverse effect on the model performance, especially on the regression-based models.

5.2. Comparative Analysis of Shear Wave Velocity for ML Algorithms

A comparison of the predicted shear wave velocity profiles from all ML algorithms with the shear wave velocity profiles from conventional SASW inversion is illustrated in Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17. Each figure shows test 1 to test 10, respectively. Within the depth of 0.5 m, the

V_{S}

obtained from conventional SASW is 206.7 m s⁻¹, whereas the

V_{S}

values obtained from MLP, RF, SVR, and LR are 213.01, 226.65, 213.25, and 213.74 m s⁻¹, respectively, for test 1. Similarly, for the second layer, the

V_{S}

values obtained are 240.73, 227.89, 211., 234.84, and 224.18 m s⁻¹ from SASW, MLP, RF, SVR, and LR, respectively. The

V_{S}

deviated at 4–5 m from the SASW value in test 1, as shown in Figure 13a. Figure 13b shows high similarity between SASW and SVR. However, a significant difference can be noticed over the depth interval of 1–2 m (up to 100 ms⁻¹).

Figure 14a illustrates that the

V_{S}

from SVR was close to the SASW values and continued up to 6 m. The

V_{S}

from SVR was consistent throughout the depth. Figure 14b also shows significant similarities up to 6 m, but from 2–3 m, SVR drifted from the

V_{S}

from SASW.

SVR showed notable results up to 6 m for Figure 15a,b and Figure 16a,b as well. The values obtained from the SVR deflected at the first layer for tests 5 and 6.

SVR makes quite a good prediction for test 9, as analyzed in Figure 17a. Furthermore, SVR predicts the

V_{S}

profile quite well, except for the second layer (1–1.5 m) and seventh layer (3.5–4 m) for test 10, as illustrated in Figure 17b.

It is apparent from these figures that SVR has outstanding performance in comparison to the other algorithms in predicting shear wave velocities. A few drifts have occurred because there were insufficient data points to train in those layers. A comparison of the experimental data and the theoretical dispersion curve corresponding to the SASW profile is provided in the appendices in order to show what impacts this difference has on the effective theoretical dispersion curves that correspond to the shear wave velocity profiles from SASW and ML, and, subsequently, on how well they compare to the experimental data.

The confidence limit, percentage error, RMSE and R² were measured for every layer (0–6 m depth) so as to produce a better evaluation of the ML algorithms.

Table 4 shows the confidence limit and the percentage error associated with the RMSE of composite values for the 10 test cases. It is visible that after 3 m depth, the confidence limit is lower than 80% in MLP, which is not reliable for design purposes [45]. The limit is found to be greater than 85% for SVR from 0–6 m depth. A detailed guideline on target confidence levels was provided by Lorig and Stacey [46]. Among all of the algorithms, SVR shows a better prediction, as the confidence limit is greater than 80%. Similarly, RF performs better, except for the last layer of depth. The highest percentage error is less than 15% for SVR, whereas it is greater than 20% for all algorithms.

The RMSE of composite values for the 10 test cases for different ML algorithms with conventional SASW inversion is shown in Figure 18. In the SVR algorithm the range of RMSE varies from 6.79 to 34.44 ms⁻¹ up to 3 m, whereas in MLP it differs from 9.05 to 36.84 ms⁻¹, in RF from 10.13 to 29.29 ms⁻¹, and in LR from 10.66 to 31.20 ms⁻¹. After 3 m, the RMSE nonlinearly increases with depth. For SVR, MLP, RF, and LR, the range of RMSE varies from 46.03 to 51.15 ms⁻¹, 76.96 to 135.56 ms⁻¹, 39.98 to 81.51 ms⁻¹, and 49.28 to 87.95 ms⁻¹, respectively. The resolution of shear wave velocity is not considerably good after the depth of 3 m. The distribution of datasets (refer to Figure 12) after 3 m of depth may cause a sudden increase of RMSE in the ML model.

R² assesses how strong the linear relationship is between the conventional SASW and ML algorithms. Figure 19 represents the R² of composite values for the 10 test cases. The R² of RF, MLP, SVR, and LR is 0.98, 0.95, 0.97, and 0.95, respectively, for the 0–0.5 m layer, which is the highest value of R². The lowest value is found to be 0.72 at 2–2.5 m depth for RF, 0.55 at 2–2.5 m depth for SVR, 0.52 at 2–2.5 m depth for MLP, and 0.55 at 2–2.5 m for LR. The R² value of LR also decreases with depth but the value increases marginally at 4–4.5 m, and then begins to decline again.

The values of mean square error (MSE), RMSE, and R² are given in Table 5.

In Figure 20a–l a schematic representation of the R² of composite values for the 10 test cases between SVR and SASW can be analyzed. In the first layer (0–0.5 m), the R² between SVR and SASW is high. The second layer (0.5–1 m) also reveals a strong association. The association between SVR and SASW is fairly high for the third layer (1–1.5 m) and the fourth layer (1.5–2 m). The R² value is 0.85, which is a significant value at a depth of 3–3.5 m, including a few outliers, as seen in Figure 20g. The coefficient of determination increases significantly to 0.92 and 0.93 after 4 m, which is known to represent a good association between the shear wave velocity obtained from SVR and SASW, as seen in Figure 20i,j. At these layers, there are very few data points, and the distribution of the datasets can create a greater RMSE than the other layers.

Although an R² close to 1 is a reasonable match, this value alone does not decide if the data points or predictions are biased. The performance also depends on the outliers and regression line, which is close to the points.

It can be concluded that the performance of SVR is better than that of the other algorithms when it comes to predicting the shear wave velocity. The other algorithms have high RMSE after 3 m of depth. Indeed, some outliers can be observed after 3 m of depth. The range of the datasets is broad, but the point that we obtain from SASW is low at this depth, which may affect the ML models.

6. Conclusions

Automation of SASW inversion was proposed in this study by adopting four ML algorithms, namely MLP, RF, SVR, and LR. This research can potentially contribute by automating the inversion procedure, which will allow us to evaluate the soil stiffness properties conveniently and rapidly. The SVR algorithm shows the lowest range of the RMSE, which indicates a better model performance, while LR shows the highest RMSE among all of the algorithms. After a depth of 3 m, the distribution of data yields RMSE to increase abruptly. The R² for SVR is found to be higher than for all algorithms. The confidence limit for MLP and LR after 3 m is lower than 80% and, thus, is considered not reliable for design purposes. The SVR and RF algorithms demonstrate being a better predictor than all algorithms, as the confidence limit is greater than 80% up to 6 m of depth. RF provides instantaneous output analysis, as no hyperparameter tuning is required. MLP provides better performance after 3 m for a few tests only. Among all algorithms, SVR shows the potential to be used as an alternative in estimating the shear wave velocity profile of soil for a given experimental dispersion curve.

For future study, it is recommended to train ML algorithms with more training data. Moreover, a log transformation of the datasets can be performed beforehand so as to remove the skewness of the data. In this study, the empirical relationship considered in dataset preparation was one-third of the wavelength to an equivalent depth. Another relationship, such as half of the wavelength, can be considered in the future. Furthermore, the ability of ML algorithms to simplify the inversion of SASW could provide an efficient comparison with inversion in the global search algorithm.

Author Contributions

Conceptualization, S.M.M., N.A.R., K.A.M.N. and M.A.Z.; methodology, S.M.M., N.A.R., K.A.M.N.; software, S.M.M. and M.A.Z.; validation, N.A.R.; formal analysis, S.M.M.; investigation, S.M.M., N.A.R. and K.A.M.N.; resources, S.M.M., N.A.R. and K.A.M.N.; writing—original draft preparation, S.M.M.; writing—review and editing, S.M.M., N.A.R., K.A.M.N., M.A.Z. and S.A.P.R.; visualization, S.M.M., N.A.R.; supervision, N.A.R., K.A.M.N. and M.A.Z.; project administration, S.M.M., N.A.R. and K.A.M.N.; funding acquisition, S.A.P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universiti Kebangsaan Malaysia under the grant DIP-2020-003.

Acknowledgments

The author would like to appreciate the people involved in the research project and acknowledge Virtual Instrument and System Innovation Sdn Bhd and Universiti Kebangsaan Malaysia for the technical and financial support under the grant DIP-2020-003.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

SASW	Spectral analysis of surface wave
DHT	Downhole tests
SCPT	Seismic cone penetration testing
ML	Machine learning
ANN	Artificial neural network
MLP	Multilayer perceptron
RF	Random forest
SVM	Support vector machine
SVR	Support vector regression
LR	Linear regression
RMSE	Root mean square error
SMP	Starting Model Parameter

References

Sutherland, B.R.; Balmforth, N.J. Damping of surface waves by floating particles. Phys. Rev. Fluids 2019, 4, 14804. [Google Scholar] [CrossRef]
Al-Hunaidi, M.O. Insights on the SASW nondestructive testing method. Can. J. Civ. Eng. 1993, 20, 940–950. [Google Scholar] [CrossRef]
Rosyidi, S.A.; Taha, M.R.; Nayan, K.A.M. Empirical model evaluation of sedimentary residual soil bearing capacity from surface wave method. J. Eng. 2010, 22, 75–88. [Google Scholar] [CrossRef]
Mitu, S.M.; Rahman, N.A.; Taib, A.M.; Nayan, K.A.M. Determination of soil bearing capacity from spectral analysis of surface wave test, standard penetration test and mackintosh probe test. Int. J. Innov. Technol. Explor. Eng. 2020, 9, 340–346. [Google Scholar] [CrossRef]
Nayan, K.A.M.; Taha, M.R.; Omar, N.A.; Bawadi, N.F.; Joh, S.-H.; Omar, M.N. Determination of ultimate pile bearing capacity from a seismic method of shear wave velocity in comparison with conventional methods. J. Teknol. 2015, 74. [Google Scholar] [CrossRef]
Bawadi, N.F.; Nayan, K.A.M.; Taha, M.R.; Omar, N.A. Estimate of small stiffness and damping ratio in residual soil using spectral analysis of surface wave method. In Proceedings of the MATEC Web of Conferences, Amsterdam, The Netherlands, 23–25 March 2016; Volume 47, p. 3017. [Google Scholar] [CrossRef]
Widodo, W.; Rosyidi, S.A.P. Experimental investigation of seismic parameters and bearing capacity of pavement subgrade using surface wave method. Semesta Tek. 2016, 12, 68–77. [Google Scholar]
Foti, S. Multistation Methods for Geotechnical Characterization Using Surface Waves; Politecnico di Torino: Torino, Italy, 2000. [Google Scholar] [CrossRef]
Nazarian, S. In Situ Determination of Elastic Moduli of Soil Deposits and Pavement Systems by Spectral-Analysis-of-Surface-Waves Method; The National Academies of Sciences, Engineering, and Medicine: Washington, DC, USA, 1984. [Google Scholar]
Haskell, N.A. The dispersion of surface waves on multilayered media. Bull. Seismol. Soc. Am. 1953, 43, 17–34. [Google Scholar]
Thomson, W.T. Transmission of elastic waves through a stratified solid medium. J. Appl. Phys. 1950, 21, 89–93. [Google Scholar] [CrossRef]
Ballard, R.F. Determination of Soil Shear Moduli at Depths by In-Situ Vibratory Techniques; U.S. Army Engineer Waterways Experiment Station, Engineer Research and Development Center: Vicksburg, MS, USA, 1964. [Google Scholar]
Hossain, M.M.; Drnevich, V.P. Numerical and optimization techniques applied to surface waves for backcalculation of layer moduli. In Nondestructive Testing of Pavements and Backcalculation of Moduli; ASTM International: West Conshohocken, PA, USA, 1989; ISBN 978-0-8031-5087-4. [Google Scholar]
Schwab, F.; Knopoff, L. Surface-wave dispersion computations. Bull. Seismol. Soc. Am. 1970, 60, 321–344. [Google Scholar]
Yuan, D.; Nazarian, S. Automated surface wave method: Inversion technique. J. Geotech. Eng. 1993, 119, 1112–1126. [Google Scholar] [CrossRef]
Addo, K.O.; Robertson, P.K. Shear-wave velocity measurement of soils using Rayleigh waves. Can. Geotech. J. 1992, 29, 558–568. [Google Scholar] [CrossRef]
Zomorodian, S.M.A.; Hunaidi, O. Inversion of SASW dispersion curves based on maximum flexibility coefficients in the wave number domain. Soil Dyn. Earthq. Eng. 2006, 26, 735–752. [Google Scholar] [CrossRef]
Meier, R.W.; Rix, G.J. An initial study of surface wave inversion using artificial neural networks. Geotech. Test. J. 1993, 16, 425–431. [Google Scholar] [CrossRef]
Foti, S.; Hollender, F.; Garofalo, F.; Albarello, D.; Asten, M.; Bard, P.-Y.; Comina, C.; Cornou, C.; Cox, B.; Di Giulio, G. Guidelines for the good practice of surface wave analysis: A product of the InterPACIFIC project. Bull. Earthq. Eng. 2018, 16, 2367–2420. [Google Scholar] [CrossRef]
Lysmer, J. Lumped mass method for Rayleigh waves. Bull. Seismol. Soc. Am. 1970, 60, 89–104. [Google Scholar]
Lysmer, J.; Waas, G. Shear waves in plane infinite structures. J. Eng. Mech. 1972. [Google Scholar] [CrossRef]
Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
Williams, T.P.; Gucunski, N. Neural networks for backcalculation of moduli from SASW test. J. Comput. Civ. Eng. 1995, 9, 1–8. [Google Scholar] [CrossRef]
Joh, S.-H.; Stokoe, K.H. Advances in Interpretation and Analysis Techniques for Spectral-Analysis-of-Surface-Waves (SASW) Measurements; Offshore Technology Research Center: College Station, TX, USA, 1997. [Google Scholar]
Shirazi, H.; Abdallah, I.; Nazarian, S. Developing artificial neural network models to automate spectral analysis of surface wave method in pavements. J. Mater. Civ. Eng. 2009, 21, 722–729. [Google Scholar] [CrossRef]
Alimoradi, A.; Shahsavani, H.; Rouhani, A.K. Prediction of shear wave velocity in underground layers using SASW and artificial neural networks. Engineering 2011, 3, 266. [Google Scholar] [CrossRef]
Gucunski, N.; Abdallah, I.N.; Nazarian, S. ANN backcalculation of pavement profiles from the SASW test. In Pavement Subgrade, Unbound Materials, and Nondestructive Testing; American Society of Civil Engineers: Reston, VA, USA, 2000; pp. 31–50. [Google Scholar] [CrossRef]
Omar, M.N.; Abbiss, C.P.; Taha, M.R.; Nayan, K.A.M. Prediction of long-term settlement on soft clay using shear wave velocity and damping characteristics. Eng. Geol. 2011, 123, 259–270. [Google Scholar] [CrossRef]
Zhang, S.X.; Chan, L.S. Possible effects of misidentified mode number on Rayleigh wave inversion. J. Appl. Geophys. 2003, 53, 17–29. [Google Scholar] [CrossRef]
Teague, D.P.; Cox, B.R. Site response implications associated with using non-unique V_S profiles from surface wave inversion in comparison with other commonly used methods of accounting for V_S uncertainty. Soil Dyn. Earthq. Eng. 2016, 91, 87–103. [Google Scholar] [CrossRef]
Marosi, K.T.; Hiltunen, D.R. Characterization of spectral analysis of surface waves shear wave velocity measurement uncertainty. J. Geotech. Geoenvironmental Eng. 2004, 130, 1034–1041. [Google Scholar] [CrossRef]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef]
Zhang, W.; Wu, C.; Li, Y.; Wang, L.; Samui, P. Assessment of pile drivability using random forest regression and multivariate adaptive regression splines. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2019, 1–14. [Google Scholar] [CrossRef]
Zhang, W.; Wu, C.; Zhong, H.; Li, Y.; Wang, L. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci. Front. 2021. [Google Scholar] [CrossRef]
Kohestani, V.R.; Hassanlourad, M.; Ardakani, A. Evaluation of liquefaction potential based on CPT data using random forest. Nat. Hazards 2015, 79, 1079–1089. [Google Scholar] [CrossRef]
Singh, A.; Thakur, N.; Sharma, A. A review of supervised machine learning algorithms. In Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 16–18 March 2016; pp. 1310–1315. [Google Scholar]
Shao, Y.-H.; Chen, W.-J.; Deng, N.-Y. Nonparallel hyperplane support vector machine for binary classification problems. Inf. Sci. (N.Y.) 2014, 263, 22–35. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
Chau, A.L.; Li, X.; Yu, W. Support vector machine classification for large datasets using decision tree and Fisher linear discriminant. Futur. Gener. Comput. Syst. 2014, 36, 57–65. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Jirandeh, A.G.; Pradhan, B.; Xu, C.; Gokceoglu, C. Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. J. Earth Syst. Sci. 2013, 122, 349–369. [Google Scholar] [CrossRef]
Wu, D.; Yang, H.; Chen, X.; He, Y.; Li, X. Application of image texture for the sorting of tea categories using multi-spectral imaging technique and support vector machine. J. Food Eng. 2008, 88, 474–483. [Google Scholar] [CrossRef]
Wang, B.; Gong, N.Z. Stealing hyperparameters in machine learning. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–24 May 2018; pp. 36–52. [Google Scholar] [CrossRef]
Brown, L.T.; Diehl, J.G.; Nigbor, R.L. A simplified procedure to measure average shear-wave velocity to a depth of 30 meters (V_S30). In Proceedings of the 12th World Conference on Earthquake Engineering, Auckland, New Zeland, 30 January–4 February 2000. [Google Scholar]
Lin, Y.-C.; Joh, S.-H.; Stokoe, K.H. Analyst J: Analysis of the UTexas 1 surface wave dataset using the SASW methodology. In Proceedings of the Geo-Congress 2014: Geo-Characterization and Modeling for Sustainability, Atlanta, GA, USA, 23–26 February 2014; pp. 830–839. [Google Scholar] [CrossRef]
Dunn, M.J. How reliable are your design inputs? In Proceedings of the International Seminar on Design Methods in Underground Mining, Perth, Australia, 17–19 November 2015; Australian Centre for Geomechanics; pp. 367–381. [Google Scholar] [CrossRef]
Lorig, L.; Stacey, P.; Read, J. Slope design methods. Guidel. Open Pit Slope Des. 2009, 1, 237–264. [Google Scholar]

Figure 1. Variation of Rayleigh wave particle motion with depth for a half-space with different values of Poisson’s ratio,

ʋ

[8].

Figure 1. Variation of Rayleigh wave particle motion with depth for a half-space with different values of Poisson’s ratio,

ʋ

[8].

Figure 2. Flow diagram describing the conventional inversion procedure and how the ML algorithm would simplify the procedure.

Figure 3. A supervised ML model.

Figure 4. Multilayer perceptron network with one hidden layer.

Figure 5. RF algorithm with n decision trees.

Figure 6. Linear SVR.

Figure 7. K-fold cross-validation with k dataset.

Figure 8. SASW method.

Figure 9. Configuration of equipment for SASW test.

Figure 10. Implementation of ML in the SASW inversion method.

Figure 11. Dataset preparation for ML algorithm.

Figure 12. Histograms of the shear wave velocity for each layer.

Figure 13.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 1 and (b) test 2.

Figure 13.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 1 and (b) test 2.

Figure 14.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 3 and (b) test 4.

Figure 14.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 3 and (b) test 4.

Figure 15.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 5 and (b) test 6.

Figure 15.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 5 and (b) test 6.

Figure 16.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 7 and (b) test 8.

Figure 16.

V_{S}

profiles from SASW inversion and ML algorithms for (a) test 7 and (b) test 8.

Figure 17.

V_{S}

profiles from SASW and ML algorithms for (a) test 9 and (b) test 10.

Figure 17.

V_{S}

profiles from SASW and ML algorithms for (a) test 9 and (b) test 10.

Figure 18. RMSE vs. depth for all algorithms.

Figure 19. Coefficient of determination for all of the algorithms.

Figure 20. R2 between the SASW and SVR for (a) 0–0.5 m, (b) 0.5–1 m, (c) 1–1.5 m, (d) 1.5–2 m, (e) 2–2.5 m, (f) 2.5–3 m, (g) 3–3.5 m, (h) 3.5–4 m, (i) 4–4.5 m, (j) 4.5–5 m, (k) 5–5.5 m, and (l) 5.5–6 m of depth.

Table 1. Studies related to SASW inversion analysis.

Author	Developed Inversion Process	Generation of Dispersion Curves	Drawbacks
Nazarian [9]	Based on forward modeling	A modified version of the Haskell–Thomson matrix solution [10,11]	Quite tedious and time-consuming
Hossain and Drnevich [13]	Powell’s conjugate directions method for optimization	The discrete layer stiffness matrix method initially developed by Lysmer [20] and Lysmer and Waas [21]	The nontranscendental quadratic eigenvalue problem
Addo and Robertson [16]	Nelder and Mead’s simplex method [22]	Automated using optimization techniques with a least-squares criterion	The number of iterations needs to be increased
Yuan and Nazarian [15]	Linearized least-squares approximation	-	-
Meier and Rix [18] and Williams and Gucunski [23]	Back-calculation neural networks	-	The network required a greater number of more complex mappings for training
Zomorodian and Hunaidi [17]	The SASW-INVERT program	The maximum vertical flexibility coefficient of the layered soil system	-

Table 2. No. of observations with the depth.

Layer Number	Depth	No. of Observations
1	0–0.5	756
2	0.5–1	1242
3	1–1.5	874
4	1.5–2	532
5	2–2.5	395
6	2.5–3	340
7	3–3.5	199
8	3.5–4	118
9	4–4.5	97
10	4.5–5	92
11	5–5.5	77
12	5.5–6	60

Table 3. ML algorithm parameter setting.

Algorithm	Parameter	Setting
MLP	Hidden layer sizes	3
	Maximum iteration	2000
	Activation	ReLU
	Validation threshold	54
RF	N estimators	50,000
	Criterion	MSE
	Minimum sample splits	67
	Maximum features	Auto
	Verbose	0
SVR	C	1
	Kernel	RBF
	Epsilon	0.2
	Maximum iteration	−1
LR	Fit intercept	True
	n jobs	None

Table 4. Confidence limit and percentage error of the four algorithms.

	Confidence Limit (%)				Percentage Error (%)
Depth (m)	MLP	RF	SVR	LR	MLP	RF	SVR	LR
0–0.5	97.41	97.10	98.06	96.95	2.59	2.90	1.94	3.05
0.5–1	92.99	93.18	93.59	92.11	7.01	6.82	6.41	7.89
1–1.5	93.04	91.63	93.50	91.57	6.96	8.37	6.50	8.43
1.5–2	91.53	92.61	91.90	91.08	8.47	7.39	8.10	8.92
2–2.5	90.83	92.17	90.16	91.09	9.17	7.83	9.84	8.91
2.5–3	89.47	92.88	91.54	91.28	10.53	7.12	8.46	8.72
3–3.5	78.01	88.71	86.81	80.60	21.99	11.29	13.19	19.40
3.5–4	79.69	83.46	86.85	85.92	20.31	16.54	13.15	14.08
4–4.5	77.18	84.49	86.78	84.21	22.82	15.51	13.22	15.79
4.5–5	77.76	80.78	85.46	80.42	22.24	19.22	14.54	19.58
5–5.5	61.27	81.54	85.50	76.37	38.73	18.46	14.50	23.63
5.5–6	76.10	76.71	85.38	74.87	23.90	23.29	14.62	25.13

Table 5. MSE, RMSE, and R² for ML.

	MLP			RF			SVR			LR
Depth	R²	MSE	RMSE	R²	MSE	RMSE	R²	MSE	RMSE	R²	MSE	RMSE
0–0.5	0.95	6.00	9.05	0.98	4.07	10.13	0.97	3.13	6.79	0.95	6.40	10.66
0.5–1	0.81	19.72	24.54	0.82	18.11	23.86	0.83	16.67	22.42	0.78	21.58	27.60
1–1.5	0.72	18.22	24.35	0.73	18.81	29.30	0.75	16.02	22.74	0.57	22.20	29.51
1.5–2	0.72	23.23	29.64	0.76	18.07	25.87	0.78	20.73	28.35	0.66	24.66	31.20
2–2.5	0.56	26.48	32.08	0.72	20.29	27.40	0.52	23.67	34.44	0.55	26.17	31.19
2.5–3	0.66	29.08	36.84	0.81	18.08	24.91	0.69	22.30	29.60	0.64	24.64	30.51
3–3.5	0.71	47.36	76.95	0.74	26.67	39.50	0.85	28.06	46.17	0.69	52.37	67.92
3.5–4	0.73	58.22	71.09	0.89	45.70	57.88	0.86	33.82	46.03	0.74	40.86	49.29
4–4.5	0.81	68.27	79.87	0.88	42.77	54.29	0.92	31.27	46.28	0.90	41.69	55.25
4.5–5	0.78	67.71	77.82	0.91	52.09	67.27	0.93	33.29	50.91	0.84	58.46	68.55
5–5.5	0.70	125.51	135.56	0.82	49.35	64.60	0.89	33.40	50.73	0.77	62.13	82.69
5.5–6	0.74	71.22	83.65	0.79	68.31	81.52	0.9	34.35	51.16	0.67	71.91	87.95

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mitu, S.M.; Rahman, N.A.; Nayan, K.A.M.; Zulkifley, M.A.; Rosyidi, S.A.P. Implementation of Machine Learning Algorithms in Spectral Analysis of Surface Waves (SASW) Inversion. Appl. Sci. 2021, 11, 2557. https://doi.org/10.3390/app11062557

AMA Style

Mitu SM, Rahman NA, Nayan KAM, Zulkifley MA, Rosyidi SAP. Implementation of Machine Learning Algorithms in Spectral Analysis of Surface Waves (SASW) Inversion. Applied Sciences. 2021; 11(6):2557. https://doi.org/10.3390/app11062557

Chicago/Turabian Style

Mitu, Sadia Mannan, Norinah Abd. Rahman, Khairul Anuar Mohd Nayan, Mohd Asyraf Zulkifley, and Sri Atmaja P. Rosyidi. 2021. "Implementation of Machine Learning Algorithms in Spectral Analysis of Surface Waves (SASW) Inversion" Applied Sciences 11, no. 6: 2557. https://doi.org/10.3390/app11062557

APA Style

Mitu, S. M., Rahman, N. A., Nayan, K. A. M., Zulkifley, M. A., & Rosyidi, S. A. P. (2021). Implementation of Machine Learning Algorithms in Spectral Analysis of Surface Waves (SASW) Inversion. Applied Sciences, 11(6), 2557. https://doi.org/10.3390/app11062557

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Implementation of Machine Learning Algorithms in Spectral Analysis of Surface Waves (SASW) Inversion

Abstract

1. Introduction

2. Inversion Analysis

3. Machine Learning (ML) Algorithms

3.1. Multilayer Perceptron (MLP)

3.2. Random Forest (RF)

3.3. Support Vector Machine (SVM)

3.4. Linear Regression (LR)

3.5. K-Fold Cross-Validation

4. Methods and Materials

5. Results and Discussions

5.1. Distribution of Datasets

5.2. Comparative Analysis of Shear Wave Velocity for ML Algorithms

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI