Top Banner
ORIGINAL ARTICLE Application of genetic algorithm-based support vector machines for prediction of soil liquefaction Xinhua Xue 1,2 Ming Xiao 2 Received: 2 February 2016 / Accepted: 20 April 2016 / Published online: 11 May 2016 Ó Springer-Verlag Berlin Heidelberg 2016 Abstract This paper presents a hybrid genetic algorithm (GA) and support vector machine (SVM) techniques to predict the potential of soil liquefaction. GA is employed in selecting the optimal values of the kernel function and the penalty parameter in SVM model to improve the fore- casting accuracy. The database used in this study includes 109 CPT-based field observations from five major earth- quakes between 1964 and 1983. Several important parameters, including the cone resistance, total vertical stress, effective vertical stress, mean grain size, normalized peak horizontal acceleration at ground surface, cyclic stress ratio, and earthquake magnitude, were used as the input parameters, while the potential of soil liquefaction was the output parameter. The predictions from the GA-SVM model were compared with those from three methods: grid search (GS) method, artificial neural network (ANN) model, and C4.5 decision tree approach. The overall clas- sification success rates for the entire dataset predicted by GA-SVM, ANN, C4.5 decision tree, and GS-SVM models are 97.25, 97.2, 96.3, and 92.66 %, respectively. The study concluded that the proposed GA-SVM model improves the classification accuracy and is a feasible method in pre- dicting soil liquefaction. Keywords Soil liquefaction Support vector machine Genetic algorithm Grid search Liquefaction Classification accuracy Introduction Soil liquefaction and related ground failures are commonly associated with large earthquakes. Liquefaction is defined as the transformation of a granular material from a solid to a liquefied state as a consequence of increased pore-water pressure and reduced effective stress (Marcuson 1978). During ground shaking, shrinkage of pore spaces of loose to medium-compact granular soils squeezes the pore water; when the pore water cannot easily drain, the pore-water pressure significantly increases, thus reducing the effective stress. If the shear resistance of the soil becomes less than the static shear stress, the soil can undergo large defor- mations and is said to liquefy (Seed and Idriss 1967, 1971). Liquefaction may cause the reduction or loss of bearing capacity, large settlement, and horizontal displacement due to lateral spreads of liquefied soils. Liquefaction can exhibit in the forms of sand boils or lateral spread of sur- ficial soils. Examples of this type of damage were observed in the historical earthquakes, such as the 1964 Niigata, the 1964 Alaska, the 1971 San Fernando, the 1985 Mexico City, the 1994 Northridge, the 1994 Kobe, the 1999 Tai- wan, the 1999 Turkey, the 2010 Baja California, and the 2011 To ¯hoku earthquakes. Therefore, the assessment of liquefaction potential due to an earthquake at a site is an imperative task in geotechnical earthquake engineering. There are different methods available for determination of liquefaction potential of soil (e.g., Seed et al. 1983; Skempton 1986; Robertson and Campanella 1985; Shibata and Teparaksa 1988; Juang et al. 2000; Zhang et al. 2004; & Xinhua Xue [email protected] 1 State Key Laboratory of Hydraulics and Mountain River Engineering, College of Water Resource and Hydropower, Sichuan University, No. 24 South Section 1, Yihuan Road, Chengdu 610065, Sichuan, People’s Republic of China 2 Department of Civil and Environmental Engineering, The Pennsylvania State University, University Park, PA 16802, USA 123 Environ Earth Sci (2016) 75:874 DOI 10.1007/s12665-016-5673-7
11

Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

Aug 12, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

ORIGINAL ARTICLE

Application of genetic algorithm-based support vector machinesfor prediction of soil liquefaction

Xinhua Xue1,2 • Ming Xiao2

Received: 2 February 2016 / Accepted: 20 April 2016 / Published online: 11 May 2016

� Springer-Verlag Berlin Heidelberg 2016

Abstract This paper presents a hybrid genetic algorithm

(GA) and support vector machine (SVM) techniques to

predict the potential of soil liquefaction. GA is employed in

selecting the optimal values of the kernel function and the

penalty parameter in SVM model to improve the fore-

casting accuracy. The database used in this study includes

109 CPT-based field observations from five major earth-

quakes between 1964 and 1983. Several important

parameters, including the cone resistance, total vertical

stress, effective vertical stress, mean grain size, normalized

peak horizontal acceleration at ground surface, cyclic stress

ratio, and earthquake magnitude, were used as the input

parameters, while the potential of soil liquefaction was the

output parameter. The predictions from the GA-SVM

model were compared with those from three methods: grid

search (GS) method, artificial neural network (ANN)

model, and C4.5 decision tree approach. The overall clas-

sification success rates for the entire dataset predicted by

GA-SVM, ANN, C4.5 decision tree, and GS-SVM models

are 97.25, 97.2, 96.3, and 92.66 %, respectively. The study

concluded that the proposed GA-SVM model improves the

classification accuracy and is a feasible method in pre-

dicting soil liquefaction.

Keywords Soil liquefaction � Support vector machine �Genetic algorithm � Grid search � Liquefaction �Classification accuracy

Introduction

Soil liquefaction and related ground failures are commonly

associated with large earthquakes. Liquefaction is defined

as the transformation of a granular material from a solid to

a liquefied state as a consequence of increased pore-water

pressure and reduced effective stress (Marcuson 1978).

During ground shaking, shrinkage of pore spaces of loose

to medium-compact granular soils squeezes the pore water;

when the pore water cannot easily drain, the pore-water

pressure significantly increases, thus reducing the effective

stress. If the shear resistance of the soil becomes less than

the static shear stress, the soil can undergo large defor-

mations and is said to liquefy (Seed and Idriss 1967, 1971).

Liquefaction may cause the reduction or loss of bearing

capacity, large settlement, and horizontal displacement due

to lateral spreads of liquefied soils. Liquefaction can

exhibit in the forms of sand boils or lateral spread of sur-

ficial soils. Examples of this type of damage were observed

in the historical earthquakes, such as the 1964 Niigata, the

1964 Alaska, the 1971 San Fernando, the 1985 Mexico

City, the 1994 Northridge, the 1994 Kobe, the 1999 Tai-

wan, the 1999 Turkey, the 2010 Baja California, and the

2011 Tohoku earthquakes. Therefore, the assessment of

liquefaction potential due to an earthquake at a site is an

imperative task in geotechnical earthquake engineering.

There are different methods available for determination

of liquefaction potential of soil (e.g., Seed et al. 1983;

Skempton 1986; Robertson and Campanella 1985; Shibata

and Teparaksa 1988; Juang et al. 2000; Zhang et al. 2004;

& Xinhua Xue

[email protected]

1 State Key Laboratory of Hydraulics and Mountain River

Engineering, College of Water Resource and Hydropower,

Sichuan University, No. 24 South Section 1, Yihuan Road,

Chengdu 610065, Sichuan, People’s Republic of China

2 Department of Civil and Environmental Engineering, The

Pennsylvania State University, University Park, PA 16802,

USA

123

Environ Earth Sci (2016) 75:874

DOI 10.1007/s12665-016-5673-7

Page 2: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

Jha and Suzuki 2009; Samui 2013; Guettaya and Ouni

2014; Pour and Asakereh 2015). Most of these methods

depend on some limit states that separate the non-lique-

faction region from the liquefaction region and were

established empirically using in situ field observations

from sites where test data are available (Mahesh 2006).

Among them, standard penetration tests (SPT), cone pen-

etration tests (CPT), flat dilatometer tests (DMT), shear

wave velocity technique (SWV), and self-boring pressure

meter test (SBPT) are the most commonly used in situ tests

for liquefaction potential prediction. However, the high

uncertainty in earthquake environments and soil charac-

teristics make it difficult to choose a suitable empirical

equation for regression analysis (Samui 2013). Conse-

quently, many scholars and engineers have attempted to

develop scientifically derived analytical models that are

simpler, easier to implement, and more accurate than tra-

ditional empirical equations for soil liquefaction analysis

(Pal 2006; Lee and Chern 2013).

In recent years, artificial neural networks (ANNs) model

has been widely used for prediction of liquefaction

potential as a classification problem (Sami and Patrick

2005; Ramakrishnan et al. 2008; Chern and Lee 2009;

Mughieda et al. 2009; Samui and Sitharam 2011; Maria

2011; Farrokhzad et al. 2012; Mert 2013). Although the

ANN was found to be more efficient compared to statistical

methods, it has several inherent drawbacks such as over-

fitting, slow convergence, poor generalizing performance,

and arriving at local minimum. Support vector machine

(SVM), developed by Vapnik (1995), is a machine-learning

algorithm. SVM classifies data with different class labels

by determining a set of support vectors that are members of

a set of training inputs that outline a hyperplane in the

feature space. SVM provides a generic mechanism that fits

the hyperplane surface to the training data using a kernel

function. However, it is necessary to optimize the SVM

parameters such as the penalty parameter C and the kernel

function parameters in order to improve the SVM classi-

fication accuracy. The grid search (GS) method is com-

monly used as parameter selection method in SVM.

However, this method is prone to trap at local optimal

points because GS is limited to the parameter value set

initially (Keerthi and Lin 2003). Compared to GS, genetic

algorithm (GA) is a search algorithm based upon the

mechanics of natural selection (Kumar and Naresh 2007),

and it outperforms in terms of both accuracy and time

efficiency. Due to the promising performances, GA has

been successfully used in various fields (e.g., Cevik and

Cabalar 2009; Ding et al. 2011; Karegowda et al. 2011;

Marghany 2013, 2014, 2015; Alemdag et al. 2016). How-

ever, few publications are available on the performance

comparison of GS and GA in optimizing the SVM model

for the prediction of soil liquefaction potential. Therefore,

the research objective is to develop a GA-SVM model for

the prediction of soil liquefaction and compare the per-

formances of the GA-SVM model with three other meth-

ods: GS-SVM, ANN, and C4.5 decision tree approaches.

Introduction of support vector machines

Consider a training sample set xi; yið Þf g; i = 1 to N, where

N is the total number of samples. It is assumed that the

samples can be classified into two classes namely triangle

class and five-point star class (Fig. 1). Labels yi ¼ �1 and

yi ¼ þ1 are associated with the triangle class and the five-

point star class, respectively. H in Fig. 1 is a labeling line

separating the two samples, and the H1 and H2 dashed

lines pass through the nearest samples to the labeling line.

The distance between H1 and H2 is called the margin, and

the nearest data points used to define the margin are called

support vectors. The separating hyperplane H that has the

maximum distance between the nearest data is called the

optimal separating hyperplane.

As Fig. 1 shows, the hyperplane that separates the given

data can be expressed as

wTxþ b ¼ 0 ð1Þ

where b is the bias, w is an adaptive weight vector, x is an

input vector, and wTx is an inner product of w and x. For

the linearly separable class, a separating hyperplane for the

two classes can be defined as

wTxi þ b� 1 if yi ¼ 1

wTxi þ b� � 1 if yi ¼ �1

�ð2Þ

or it can be presented in the following complete equation:

yi wTxi þ b

� �� 1� 0 ð3Þ

The goal of the SVM is to find b and w for the optimal

separating hyperplane to maximize the margin 2= wk k.Therefore, the hyperplane that optimally separates the data

Fig. 1 Hyper-plane classification

874 Page 2 of 11 Environ Earth Sci (2016) 75:874

123

Page 3: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

is the one that minimizes wk k. The optimal separating

hyperplane can be obtained by solving the following con-

vex quadratic optimization problem (Vapnik 1995):

Minimize1

2wTw ¼ 1

2wk k2 ð4Þ

Subject to yi wTxi þ b

� �� 1� 0 i ¼ 1; 2; . . .;N ð5Þ

The above equation can be transformed into the equiv-

alent Lagrangian dual problem as (Lee and Chern 2013)

L w; b; að Þ ¼ 1

2wTw�

XNi¼1

ai yi wTxi þ b

� �� 1

� �ð6Þ

where ai � 0 (i = 1, 2,…, N) are the Lagrange multipliers.

The main goal here is to find w and b, and the a which,

respectively, minimize and maximizes Eq. (6). This can be

done by differentiating L with respect to w and b, and

setting the derivatives to zero (Lee and Chern 2013):

oL w; b; að Þow

¼ 0 ) w ¼XNi¼1

aiyixi ð7aÞ

oL w; b; að Þob

¼ 0 )XNi¼1

aiyi ¼ 0 ð7bÞ

Multiplying Eq. (7a) by wT leads to

wTw ¼XNi¼1

aiyiwTxi ¼

XNi¼1

XNj¼1

aiajyiyjxTi xj

i ¼ 1; 2; . . .;N; j ¼ 1; 2; . . .;Nð Þð8Þ

Based on Eqs. (7a), (7b), and (8), Eq. (6) can then be

reformulated as follows:

L að Þ ¼XNi¼1

ai �1

2

XNi¼1

XNj¼1

aiajyiyjxTi xj ð9Þ

Subject to

PNi¼1 aiyi ¼ 0

ai � 0; i ¼ 1; 2; . . .;N

�ð10Þ

It can be seen that this is a convex quadratic optimiza-

tion problem, and it requires a solution a�i to determine the

parameters w� and b� for the optimal hyperplane. Thus, the

optimal hyperplane decision function can be written as

(Lee and Chern 2013)

f xð Þ ¼ sign w�Txi þ b�� �

¼ signXNi¼1

a�i yixTi xj þ b�

!

ð11Þ

where sign is the signum function. If the result is positive, then

it is classified x as class 1, otherwise, it is classified as class 2.

However, because most of the classification problems

are linearly non-separable instances, Cortes and Vapnik

(1995) introduced positive slack variable n and a penalty

parameter C to reduce the number of misclassifications.

Thus, Eq. (4) can be changed as follows:

Minimize1

2wk k2þC

XNi¼1

ni ð12Þ

Subject toyi w

Txi þ bð Þ� 1� nini � 0; i ¼ 1; 2; . . .;N

�ð13Þ

where the slack variable n accounts for the effect of mis-

classification and the penalty parameter C defines the trade-

off between the number of misclassification in the training

data and margin maximization.

As above, the Lagrange function for this problem can be

written as

L w; b; n; a; bð Þ ¼ 1

2wTwþ C

XNi¼1

ni

�XNi¼1

ai yi wTxi þ b

� �� 1þ ni

� �

þXNi¼1

bini ð14Þ

Subject to ai; bi � 0 i ¼ 1; 2; . . .;Nð Þ ð15Þ

where ai and bi are the Lagrange multipliers.

Differentiating L with respect to w, b, and n, and setting

the derivatives to zero lead to

oL w; b; n; a; bð Þow

¼ 0 ) w ¼XNi¼1

aiyixi ð16aÞ

oL w; b; n; a; bð Þob

¼ 0 )XNi¼1

aiyi ¼ 0 ð16bÞ

oL w; b; n; a; bð Þoni

¼ 0 ) C � ai � bi ¼ 0 ð16cÞ

Based on Eqs. (16a), (16b), and (16c), Eq. (14) can then be

reformulated as follows:

L w; b; n; a; bð Þ ¼XNi¼1

ai �1

2

XNi¼1

XNj¼1

aiajyiyjxTi xj ð17Þ

Subject to

Pni¼1 aiyi ¼ 0

0� ai �C; i ¼ 1; 2; . . .;N

�ð18Þ

The equations are almost the same dual problem as

before, with a slight difference being that the multipliers aihave an extra constraint. For nonlinear problems, the

optimization problem can be solved by mapping the data

into a high-dimensional feature space. In high-dimensional

feature space, the inner product operation of linear problem

can be substituted by kernel functions, i.e.,

K xi; xj� �

¼ u xið Þu xj� �

. Therefore, the decision function

can be written as

Environ Earth Sci (2016) 75:874 Page 3 of 11 874

123

Page 4: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

f xð Þ ¼XNi¼1

yiaiK xi; xj� �

þ b ð19Þ

There are several kernel functions, such as the linear kernel

functions, polynomial kernel functions, radial basis func-

tion (RBF), sigmoid kernel functions, that are used in

SVM. Dibike et al. (2001) demonstrated that the RBF

outperformed other kernel functions after using different

kernels in SVM for rainfall runoff modeling. Therefore,

RBF is adopted in this study and expressed as

K xi; xj� �

¼ exp �xi � xj�� ��2

2r2

!ð20Þ

where r is the width of the radial basis function.

GA-based parameter optimization method

Parameters of the kernel function and the penalty param-

eter (C) in SVM model would greatly influence the pre-

diction results. However, the SVM theory does not provide

method for selecting these parameters. For optimizing the

parameters of the kernel function and the penalty parameter

(C) in SVM model, genetic algorithm (GA) was used to

select the best-matched parameters in this study.

Genetic algorithm (GA)

Genetic algorithm (GA) is a search algorithm based upon

the mechanics of natural selection, derived from the

theory of natural evolution. GA simulates mechanisms of

population genetics and natural rules of survival in

pursuit of the ideas of adaptation. A typical GA starts

with an initial set of random solutions called population,

and each individual in the population is called a chro-

mosome. A chromosome is usually, but not necessarily,

a binary string and represents a solution to a problem.

Chromosomes evolve through successive iterations,

called generations. During each generation, the chromo-

somes are evaluated, using some measures of fitness. To

create the next generation, new chromosomes, called

offspring, are formed by either (1) merging two chro-

mosomes from the current generation using a crossover

operator or (2) modifying a chromosome using a muta-

tion operator. A new generation is formed by (1)

selecting some of the parents and offspring according to

their fitness values and (2) rejecting the rest so that the

population size is kept constant. In the process, better-

fitting chromosomes have a higher chance of being

selected. After several generations, the algorithm con-

verges to the best set of chromosomes, which hopefully

represent the optimum or near-optimal solution to the

problem.

SVM optimized by GA (GA-SVM)

To design an effective classifier, the parameters of SVM

model have to be configured properly in advance. In this

section, the proposed GA-based SVM parameter opti-

mization approach is described as follows.

Chromosome design

In this study, the RBF kernel function is adopted due to its

promising performances, and thus, only two parameters, C

and r, need to be optimized by using the proposed GA-

based method. Therefore, the chromosome comprises two

parameters, C and r. Figure 2 shows the chromosome

model formed by two binary blocks: (1) the first block,

which includes Ci with 1� i� nc, is the C parameter binary

representation in nc bits; (2) the second block, which

includes rj with 1� j� nr, is the parameter binary repre-

sentation in nr bits.

All the SVM parameters are numerical values, and it is

necessary to convert the C and r parameter values in binary

representation to a decimal representation form. This is

done at each chromosome fitness evaluation in order to

enable the classifier configuration and was performed here

by Eq. (21) (Huang and Wang 2006).

p ¼ minp

þmaxp �minp

2l � 1� d ð21Þ

where p denotes the phenotype parameter, minp and maxpare the minimum and maximum value of parameter p,

respectively, d is the decimal value of bit string, and l is the

number of bits used in the parameter binary representation.

Fitness function

Fitness function is an objective function that estimates the

quality of each chromosome. In the GA-based SVM

parameter optimization process, the most difficult work is

to design a fitness function to produce SVM parameters

that are reliable and effective for SVM models. K-fold

cross-validation (CV) is a widely used technique to assess

the generalization ability of an SVM classifier. One idea

would be to use cross-validation accuracy (CVA) as fitness

function in the genetic algorithm instead of classification

accuracy. Higher CVA represents greater SVM classifier

generalization ability on given training data. Here, we

Fig. 2 Chromosome comprises two parameters, C and r (after

Marghany 2014)

874 Page 4 of 11 Environ Earth Sci (2016) 75:874

123

Page 5: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

apply a fivefold cross-validation accuracy computed by

SVM classifier on the training set to the GA fitness func-

tion. We further calculate the average accuracy of the

fivefold cross-validation as the final fitness value. The

corresponding function can be defined as follows:

fi ¼1

n

Xni¼1

SVMCVA i½ �ð Þ ð22Þ

where SVMCVA i½ � denotes the cross-validation accuracy of

fold i calculated by the SVM model and n indicates the

total number of folds executed with cross-validation. Here

n is equal to 5.

Genetic operators

Figure 3 illustrates the genetic operators of crossover and

mutation. Crossover is the critical genetic operator that

allows new solution regions in the search space to be

explored, and it is performed by selecting a random gene

along the length of the chromosomes and swapping all

genes after that point (Marghany 2015). In mutation, the

genes may occasionally be altered, i.e., binary code genes

can change from 1 to 0 or vice versa.

GA-SVM approach

The main steps of the proposed GA-SVM approach are

described as follows:

1. Data preprocess. Normalize the datasets according to

Eq. (23)

y ¼ x� xmin

xmax � xmin

ð23Þ

where y is a normalized input parameter, x is the

original input parameter, and xmax and xmin are the

maximum and minimum parameters, respectively.

2. Initialization. Generate initial population which indi-

vidually is comprised of C and r, then convert each

chromosome from its genotype into phenotype.

3. Fitness evaluation. For each chromosome representing

C and r, training dataset is used to train the SVM

classifier to calculate the cross-validation accuracy,

while the testing dataset is used to validate the model.

When the cross-validation accuracy is obtained, each

chromosome is evaluated by fitness function Eq. (22).

4. Termination criteria. When the termination criteria are

satisfied, the process ends; otherwise, we proceed with

the next generation.

5. Genetic operation. In this step, the system searches for

better solutions by genetic operations, including

selection, crossover, and mutation.

The implementation of the proposed GA-SVM approach

was carried out using MATLAB R2012b program in this

study. The flowchart of the GA-SVM algorithm is shown in

Fig. 4. This approach is applied in the case study for soil

liquefaction prediction.

GS-based parameter optimization method

To verify the application effect of the proposed GA-SVM

model, this study also adopted the grid search (GS) method

Fig. 3 Genetic crossover and mutation operation Fig. 4 Flowchart of the GA-SVM model

Environ Earth Sci (2016) 75:874 Page 5 of 11 874

123

Page 6: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

to optimize the parameters of SVM model. GS is a com-

mon method for searching for the best C and r. Figure 5

shows the process of GS-SVM. In the grid search algo-

rithm, pairs of (C, r) are tried and the one with the best

cross-validation accuracy is chosen. After identifying a

better region on the grid, a finer grid search in that region

can be conducted.

The implementation of the proposed GS-SVM approach

was carried out using MATLAB R2012b program in this

study. The flowchart of the GS-SVM model is shown in

Fig. 5.

Case study

Database

It is generally known that the susceptibility of soil deposits

to liquefaction is determined by a combination of various

factors to which they may be subjected, such as soil

properties, geological conditions, and ground motion

characteristics. Therefore, it is widely recognized that the

determination of liquefaction potential is a complex

geotechnical engineering problem. The soil properties and

geological conditions determine the resistance of the

deposit to liquefaction, while earthquake characteristics

control the seismic loading conditions. The general criteria

for soil liquefaction are: (1) the subsoil is loose sandy

deposit, (2) the subsoil is saturated, and (3) the earthquake

intensity is sufficiently large, and the duration of shaking is

sufficiently long. Accordingly, the following factors are

chosen as the evaluating indices: cone resistance (qc), total

vertical stress (r0), effective vertical stress (r00), mean grain

size (D50), normalized peak horizontal acceleration at

ground surface (amax), cyclic stress ratio (s�r00), and

earthquake magnitude (Mw).

The database used in this study includes 109 CPT-based

field observations fromfivemajor earthquakes between 1964

and 1983 (Goh 1996), and it has a wide range of parameters.

The data consisted of 16 case records from Japan, 79 from

China, 9 from the USA, and 5 from Romania. This repre-

sented 74 sites that liquefied and 35 sites that did not liquefy

(Table 1). To the best of the authors’ knowledge, there is no

criterion used in dividing the data into training and testing

sets presently, and the method for dividing the data into

training and testing is arbitrary. For instance, Kurup and

Dudani (2002) used 63 % of the data for training, Tang et al.

(2005) used 75 %, while Padmini et al. (2008) used 80 %. In

this study, 74 datasets are used for the training dataset, and

other 35 datasets are used for the testing dataset. Here, for

each case, ‘‘1’’ represents liquefaction and ‘‘0’’ denotes no

liquefaction.

Results and discussion

Figure 6 shows that the best (C, r) is (12.5152, 1.0616)

with a cross-validation rate of 97.29 % after training the

GA-SVM model. In this result, the optimal parameters are

used to test the GA-SVM model to generate the final

classifier. Out of the 35 testing datasets used for prediction,

only one case was misclassified, achieving an overall

classification accuracy rate of 97.25 %.

In addition to verifying the effectiveness of the proposed

GA-SVM model, this study compares it with the GS-SVM

model, artificial neural network (ANN) (Goh 1996), and

C4.5 decision tree (Ardakani and Kohestni 2015) models.

Since the bounds of parameters of GS-SVM are not known

a priori, a coarse range search is made to find the best

region of the GS-SVM parameters. Performing a complete

grid search may be time-consuming. Hence, a coarse grid

search is performed first. Once the coarse grid search is

performed, fine grid search is then performed. The ranges

of parameters taken for coarse grid and fine grid search are

given in Table 2.Fig. 5 Flowchart of the GS-SVM model

874 Page 6 of 11 Environ Earth Sci (2016) 75:874

123

Page 7: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

Table 1 Dataset used in case study (1: liquefaction; 0: no liquefaction) [data from Goh (1996)]

No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by

GA-SVM

1 7.5 53 36 3.20 0.16 0.15 0.331 1 1

2 7.5 87 52 1.6 0.16 0.16 0.331 1 1

3 7.5 99 58 7.2 0.16 0.17 0.331 1 1

4 7.5 152 83 5.6 0.16 0.17 0.331 1 1

5 7.5 91 63 5.45 0.16 0.14 0.331 1 1

6 7.5 127 80 8.84 0.16 0.15 0.331 1 1

7 7.5 211 120 9.70 0.16 0.15 0.331 1 1

8 7.5 86 46 8.0 0.16 0.19 0.30 0 0

9 7.5 95 50 14.55 0.16 0.18 0.30 0 0

10 7.7 58 48 10.0 0.23 0.18 0.32 0 0

11 7.7 73 54 16.0 0.23 0.20 0.32 0 0

12 7.7 96 6 15.38 0.23 0.21 0.32 0 0

13 7.7 54 46 1.79 0.23 0.17 0.32 1 1

14 7.7 64 52 4.10 0.23 0.19 0.32 1 1

15 7.7 96 67 7.96 0.23 0.21 0.32 1 1

16 7.7 114 75 8.97 0.23 0.22 0.32 1 1

17 7.8 57 42 1.70 0.40 0.35 0.06 1 1

18 7.8 114 69 9.40 0.40 0.41 0.25 1 1

19 7.8 148 85 5.70 0.40 0.42 0.25 1 1

20 7.8 162 92 7.60 0.40 0.42 0.30 1 1

21 7.8 17 17 1.5 0.40 0.27 0.17 1 1

22 7.8 25 21 1.0 0.40 0.32 0.17 1 1

23 7.8 34 25 5.0 0.40 0.36 0.17 1 1

24 7.8 38 34 2.50 0.40 0.29 0.14 1 1

25 7.8 57 43 2.60 0.40 0.34 0.14 1 1

26 7.8 76 52 3.20 0.40 0.37 0.16 1 1

27 7.8 89 58 5.80 0.40 0.9 0.16 1 1

28 7.8 122 74 3.50 0.40 0.40 0.16 1 1

29 7.8 181 102 8.40 0.40 0.41 0.16 1 1

30 7.8 38 29 1.70 0.40 0.35 0.12 1 1

31 7.8 40 29 3.0 0.40 0.36 0.12 1 1

32 7.8 51 35 4.10 0.40 0.38 0.12 1 1

33 7.8 29 27 5.50 0.40 0.29 0.17 1 1

34 7.8 57 40 9.00 0.40 0.37 0.32 1 1

35 7.8 23 21 7.00 0.40 0.29 0.48 1 1

36 7.8 34 26 1.18 0.40 0.35 0.48 1 1

37 7.8 48 33 4.24 0.40 0.38 0.48 1 1

38 7.8 76 71 11.47 0.40 0.27 0.16 0 0

39 7.8 160 111 15.76 0.40 0.34 0.20 0 0

40 7.8 59 56 11.39 0.20 0.14 0.21 0 0

41 7.8 78 65 12.12 0.20 0.15 0.21 0 0

42 7.8 99 75 17.76 0.20 0.17 0.14 0 0

43 7.8 74 49 2.65 0.20 0.19 0.14 1 1

44 7.8 53 35 4.40 0.20 0.20 0.16 1 1

45 7.8 61 39 3.00 0.20 0.20 0.16 1 1

46 7.8 156 81 9.00 0.20 0.23 0.08 1 1

47 7.8 99 55 2.00 0.10 0.11 0.14 1 0

48 7.8 95 52 1.10 0.20 0.23 0.07 1 0

Environ Earth Sci (2016) 75:874 Page 7 of 11 874

123

Page 8: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

Table 1 continued

No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by

GA-SVM

49 7.8 209 106 15.50 0.10 0.11 0.08 0 0

50 7.8 217 110 6.50 0.10 0.11 0.08 0 0

51 7.8 91 53 9.00 0.10 0.11 0.10 0 0

52 7.8 101 58 2.50 0.10 0.11 0.10 0 0

53 7.8 112 63 16.50 0.10 0.11 0.10 0 0

54 7.8 91 68 13.65 0.10 0.06 0.25 0 0

55 7.8 114 58 8.47 0.20 0.24 0.062 0 0

56 7.8 228 112 4.55 0.20 0.23 0.067 0 0

57 7.8 249 122 5.79 0.20 0.22 0.067 0 0

58 7.8 121 55 2.48 0.20 0.25 0.062 1 1

59 7.8 114 56 1.57 0.20 0.25 0.062 1 1

60 7.8 213 103 1.4 0.20 0.23 0.67 1 1

61 7.8 220 106 2.15 0.20 0.23 0.67 1 1

62 7.8 230 111 2.60 0.20 0.23 0.67 1 1

63 7.8 213 103 2.73 0.20 0.23 0.67 1 1

64 7.8 219 106 1.78 0.20 0.23 0.67 1 1

65 7.8 211 108 7.64 0.20 0.22 0.67 0 0

66 6.6 29 29 25.60 0.80 0.44 0.11 0 0

67 6.6 48 36 24.70 0.80 0.57 0.11 0 0

68 6.6 64 42 31.40 0.80 0.64 0.11 0 0

69 6.6 29 29 1.43 0.80 0.44 0.11 1 1

70 6.6 64 42 2.48 0.80 0.64 0.11 1 1

71 6.6 96 54 4.03 0.80 0.72 0.11 1 1

72 6.6 29 29 3.30 0.80 0.44 0.06 0 0

73 6.6 48 36 8.80 0.80 0.57 0.06 0 0

74 6.6 64 42 6.70 0.80 0.64 0.06 0 0

75 7.8 89 78 1.65 0.20 0.41 0.17 1 1

76 7.8 99 83 3.65 0.20 0.15 0.17 1 1

77 7.8 29 25 1.03 0.20 0.15 0.19 1 1

78 7.8 55 37 5.00 0.20 0.19 0.31 1 1

79 7.8 76 47 2.91 0.20 0.21 0.18 1 1

80 7.8 105 61 6.06 0.20 0.21 0.18 1 1

81 7.8 23 22 13.24 0.20 0.14 0.17 0 0

82 7.8 32 26 13.06 0.20 0.16 0.17 0 0

83 7.8 40 30 16.59 0.20 0.18 0.17 0 0

84 7.8 59 59 10.59 0.20 0.13 0.26 0 0

85 7.8 63 61 9.12 0.20 0.13 0.26 0 0

86 7.8 76 67 11.29 0.20 0.15 0.26 0 0

87 7.8 70 41 1.94 0.20 0.22 0.16 1 1

88 7.8 76 44 5.00 0.20 0.22 0.16 1 1

89 7.8 70 47 2.24 0.20 0.19 0.14 1 1

90 7.8 114 79 14.12 0.20 0.09 0.25 0 0

91 7.8 162 102 18.94 0.20 0.09 0.28 0 0

92 7.8 44 44 3.52 0.20 0.13 0.16 1 1

93 7.8 59 51 2.73 0.20 0.15 0.16 1 1

94 7.8 57 50 3.29 0.20 0.15 0.21 1 1

95 7.8 61 52 4.12 0.20 0.15 0.21 1 1

96 7.8 72 57 2.94 0.20 0.16 0.21 1 1

874 Page 8 of 11 Environ Earth Sci (2016) 75:874

123

Page 9: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

After the GS-SVM procedure, the optimal parameters

with maximal classification accuracy are selected. As

shown in Figs. 7 and 8, the best (C, r) is (3.7321, 2.0) witha cross-validation rate of 91.89 %. In this result, the opti-

mal parameters are used to test the GS-SVM model to

generate the final classifier. Out of the 35 testing datasets

used, only two cases were misclassified, achieving an

overall classification accuracy rate of 92.66 %.

The comparison of prediction results of GA-SVM with

GS-SVM, ANN (Goh 1996), and C4.5 decision tree (Ar-

dakani and Kohestni 2015) is shown in Table 3. From

Table 3, it can be seen that the classification success rate of

the GA-SVM model for the training data is 97.29 %, which

is higher than the rates of C4.5 decision tree (95.9 %) and

GS-SVM models (91.89 %), and is slightly lower than the

rate of ANN (98.6 %) reported by Goh (1996). However,

the overall classification success rate of GA-SVM for the

entire dataset is 97.25 %, while the overall classification

success rates predicted by ANN, C4.5 decision tree, and

GS-SVM models are 97.20, 96.30, and 92.66 %, respec-

tively. It can be seen that the overall classification success

rate predicted by GA-SVM model is the highest among the

four methods. Moreover, the ANN uses many parameters,

such as the number of hidden layers, number of hidden

Table 1 continued

No. Mw r00 (kPa) r0 (kPa) qc (MPa) amax(g) s�r00 D50 (mm) Actual Predicted by

GA-SVM

97 7.8 61 52 3.00 0.20 0.15 0.15 1 1

98 7.8 95 68 5.85 0.20 0.18 0.32 1 1

99 7.8 106 73 9.00 0.20 0.18 0.32 1 1

100 7.8 49 48 1.88 0.20 0.13 0.13 1 1

101 7.8 74 64 2.55 0.20 0.15 0.17 1 1

102 7.8 76 65 4.50 0.20 0.15 0.17 1 1

103 7.8 106 79 4.24 0.20 0.17 0.17 1 1

104 7.8 114 77 8.00 0.20 0.18 0.22 0 1

105 7.2 80 48 5.22 0.22 0.21 0.20 1 1

106 7.2 95 55 3.73 0.22 0.22 0.20 1 1

107 7.2 114 64 3.11 0.22 0.22 0.20 1 1

108 7.2 133 73 1.32 0.22 0.22 0.20 1 1

109 7.2 152 82 5.22 0.22 0.22 0.20 1 1

Fig. 6 Fitness curve for GA-

SVM to find the best parameters

Table 2 Coarse and fine range partitions of GS-SVM parameters

taken during two-step optimization

SVM parameters Coarse range partitions Fine range partitions

C [2-10, 2-10] [2-2, 2-2]

r [2-10, 2-10] [2-2, 2-2]

Environ Earth Sci (2016) 75:874 Page 9 of 11 874

123

Page 10: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

nodes, learning rate, momentum term, number of training

epochs, and weight initialization methods. The decision

tree method is quite transparent and does not need opti-

mization and internal parameters; however, it has a limited

domain of applicability and is mostly case dependent. The

generalization of decision tree method is limited and is

only applicable in the range of training data. The case study

on soil liquefaction prediction shows that the proposed GA-

SVM model has better performance than the GS-SVM,

ANN, and C4.5 decision tree approaches.

Conclusions

Determination of liquefaction potential is a complex

geotechnical engineering problem due to the heterogeneous

nature of the soils and the participation of a large number

of factors that affect the liquefaction occurrence due to

earthquakes. This study presents a hybrid model based on

the combination of SVM and GA to improve the fore-

casting performance. GA is employed in selecting the

optimal values of the kernel function and the penalty

parameter in SVM model to improve the forecasting

accuracy. The developed GA-SVM approach was com-

pared with GS-SVM, ANN model, and C4.5 decision tree

approach. Prediction results demonstrate that the classifi-

cation accuracy rate of the developed GA-SVM approach

surpasses those of the other methods. The GA-SVM model

can be used as a reliable approach in soil liquefaction

prediction.

Fig. 7 Parameters C and rversus the accuracy rate in two

dimensions

Fig. 8 Parameters C and r versus the accuracy rate in three

dimensions

Table 3 Comparison of model

predictions of GA-SVM with

GS-SVM, ANN, and C4.5

decision tree

Methods Performance in terms of successful prediction (%)

Training Testing Overall

ANN (Goh 1996) 98.6 94.3 97.20

C4.5 decision tree (Ardakani and Kohestni 2015) 95.9 97.1 96.30

GA-SVM 97.29 97.14 97.25

GS-SVM 91.89 94.29 92.66

874 Page 10 of 11 Environ Earth Sci (2016) 75:874

123

Page 11: Application of genetic algorithm-based support vector ... Papers...and arriving at local minimum. Support vector machine (SVM), developed by Vapnik (1995), is a machine-learning algorithm.

References

Alemdag S, Gurocak Z, Cevik A, Cabalar AF, Gokceoglu C (2016)

Modeling deformation modulus of a stratified sedimentary rock

mass using neural network, fuzzy inference and genetic

programming. Eng Geol 203:70–82

Ardakani A, Kohestni VR (2015) Evaluation of liquefaction potential

based on CPT results using C4.5 decision tree. J AI Data Mining

3(1):85–92

Cevik A, Cabalar AF (2009) Modelling damping ratio and shear

modulus of sand-mica mixtures using genetic programming.

Expert Syst Appl 36(4):7749–7757

Chern SG, Lee CY (2009) CPT-based simplified liquefaction

assessment by using fuzzy-neural network. J Mar Sci Technol

17(4):326–331

Cortes C, Vapnik V (1995) Support-vector network. Mach Learn

20:273–297

Dibike YB, Velickov S, Solomatine D, Abbott MB (2001) Model

induction with support vector machines: introduction and

applications. J Comput Civ Eng 15(3):208–216

Ding SF, Su CY, Yu JZ (2011) An optimizing BP neural network

algorithm based on genetic algorithm. Artif Intell Rev

36:153–162

Farrokhzad F, Choobbasti AJ, Barari A (2012) Liquefaction micro-

zonation of Babol city using artificial neural network. J King

Saud Univ Sci 24(1):89–100

Goh AT (1996) Neural-network modeling of CPT seismic liquefac-

tion data. J Geotech Eng 122(1):70–73

Guettaya I, Ouni MREL (2014) In situ-based assessment of soil

liquefaction potential-case study of an earth dam in Tunisia.

Front Struct Civ Eng 8(4):456–461

Huang CL, Wang CJ (2006) A GA-based feature selection and

parameters optimization for support vector machines. Expert

Syst Appl 31:231–240

Jha SK, Suzuki K (2009) Reliability analysis of soil liquefaction

based on standard penetration test. Comput Geotech

36(4):589–596

Juang CH, Chen CJ, Jiang T, Andrus RD (2000) Risk-based

liquefaction potential evaluation using standard penetration tests.

Can Geotech J 37:1195–1208

Karegowda AG, Manjunath AS, Jayaram MA (2011) Application of

genetic algorithm optimized neural network connection weights

for medical diagnosis of Pima Indians diabetes. Int J Soft

Comput 2(2):15–22

Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector

machines with Gaussian kernel. Neural Comput 15(7):1667–1689

Kumar S, Naresh R (2007) Efficient real coded genetic algorithm to

solve the non-convex hydrothermal scheduling problem. Int J

Electr Power Energy Syst 29(10):738–747

Kurup PU, Dudani NK (2002) Neural networks for profiling stress

history of clays from PCPT data. J Geotech Geoenviron

128(4):569–579

Lee CY, Chern SG (2013) Application of a support vector machine

for liquefaction assessment. J Mar Sci Technol 21(3):318–324

Mahesh P (2006) Support vector machines-based modelling of

seismic liquefaction potential. Int J Numer Anal Methods

Geomech 30:983–996

Marcuson WF III (1978) Definition of terms related to liquefaction.

J Geotech Eng Div ASCE 104(9):1197–1200

Marghany M (2013) Genetic algorithm for oil spill automatic

detection from Envisat Satellite Data. In: Computational science

and its applications—ICCSA 2013. Springer, Berlin, pp 587–598

Marghany M (2014) Multi-objective evolutionary algorithm for oil

spill detection from COSMO-SkeyMed Satellite. In:

Computational science and its applications—ICCSA 2014.

Springer, Berlin, pp 355–371

Marghany M (2015) Automatic detection of oil spills in the Gulf of

Mexico from RADARSAT-2 SAR satellite data. Environ Earth

Sci 74:5935–5947

Maria JS (2011) Applying artificial neural networks for analysis of

geotechnical problems. Comput Assist Mech Eng Sci

18:231–241

Mert T (2013) A comparative study on computer aided liquefaction

analysis methods. Int J Housing Sci 37(2):121–135

Mughieda O, Bani HK, Safieh B (2009) Liquefaction assessment by

artificial neural networks based on CPT. Int J Geotech Eng

2:289–302

Padmini D, Ilamparuthi K, Sudheer KP (2008) Ultimate bearing

capacity prediction of shallow foundations on cohesionless soils

using neurofuzzy models. Comput Geotech 35:33–46

Pal M (2006) Support vector machines-based modeling of seismic

liquefaction potential. Int J Numer Anal Methods Geomech

30:983–996

Pour MN, Asakereh A (2015) A comparison between two field

methods of evaluation of liquefaction potential in the Bandar

Abbas city. Am J Civ Eng 3(2-2):1–5

Ramakrishnan D, Singh TN, Purwar N, Badre KS, Gulati A, Gupta S

(2008) Artificial neural network and liquefaction susceptibility

assessment: a case study using the 2001 Bhuj earthquake data,

Gujarat, India. Comput Geosci 12:491–501

Robertson PK, Campanella RG (1985) Liquefaction potential of sands

using the cone penetration test. J Geotech Eng Div ASCE

111(3):384–403

Sami M, de Patrick B (2005) Minimum principle and related

numerical scheme for simulating initial flow and subsequent

propagation of liquefied ground. Int J Numer Anal Methods

Geomech 29:1065–1086

Samui P (2013) Liquefaction prediction using support vector machine

model based on cone penetration data. Front Struct Civ Eng

7(1):72–82

Samui P, Sitharam TG (2011) Machine learning modelling for

predicting soil liquefaction susceptibility. Nat Hazards Earth

Syst Sci 11:1–9

Seed HB, Idriss IM (1967) Analysis of soil liquefaction: Niigata

earthquake. J Soil Mech Foun Div ASCE 93(3):83–108

Seed HB, Idriss IM (1971) Simplified procedure for evaluating soil

liquefaction potential. J Soil Mech Foun Div ASCE

97(9):1249–1273

Seed HB, Idriss IM, Arango I (1983) Evaluation of liquefaction

potential using field performance data. J Geotech Eng Div ASCE

109(3):458–482

Shibata T, Teparaksa W (1988) Evaluation of liquefaction potentials

of soils using cone penetration tests. Soils Found 28(2):49–60

Skempton AW (1986) Standard penetration test procedures and the

effects in sands of overburden pressure, relative density, particle

size, aging and overconsolidation. Geotechnique 36(3):425–447

Tang Y, Zang YQ, Huang G, Hu X (2005) Granular SVM-RFE geneselection algorithm for reliable prostate cancer classification on

microarray expression data. In: Proceedings of the 5th IEEE

symposium on bioinformatics and bioengineering (BIBE’05)

Vapnik V (1995) The nature of statistical learning theory. Springer,

New York

Zhang G, Robertson PK, Brachman RWI (2004) Estimating lique-

faction-induced lateral displacements using the standard pene-

tration test or cone penetration test. J Geotech Geoenviron Eng

130(8):861–871

Environ Earth Sci (2016) 75:874 Page 11 of 11 874

123