IBP3093_10.pdf

7/29/2019 IBP3093_10.pdf

http://slidepdf.com/reader/full/ibp309310pdf 1/8

______________________________1 Electrical Engineer - M.S. graduate student at UNICAMP/FEM/DEP2 Dsc, Mechanical Engineer - Researcher at UNICAMP/CEPETRO/UNISIM3 PhD, Petroleum Engineering - Professor at UNICAMP/FEM/DEP

IBP3093_10

STUDY OF THE INFLUENCE OF TRAINING DATA SET IN

ARTIFICIAL NEURAL NETWORK APPLIEDTO THE HISTORY MATCHING PROCESS

Luis A. N. Costa1, Célio Maschio2, Denis J. Schiozer3

Copyright 2010, Instituto Brasileiro de Petróleo, Gás e Biocombustíveis - IBPEste Trabalho Técnico foi preparado para apresentação na Rio Oil & Gas Expo and Conference 2010, realizada no período

de 13 a 16 de setembro de 2010, no Rio de Janeiro. Este Trabalho Técnico foi selecionado para apresentação pelo Comitê Técnico doevento, seguindo as informações contidas na sinopse submetida pelo(s) autor(es). O conteúdo do Trabalho Técnico, comoapresentado, não foi revisado pelo IBP. Os organizadores não irão traduzir ou corrigir os textos recebidos. O material conforme,apresentado, não necessariamente reflete as opiniões do Instituto Brasileiro de Petróleo, Gás e Biocombustíveis, seus Associados eRepresentantes. É de conhecimento e aprovação do(s) autor(es) que este Trabalho Técnico seja publicado nos Anais da Rio Oil &

Gas Expo and Conference 2010.

Abstract

History matching is an inverse problem designed to find a combination of reservoir parameters that minimizean objective function that represents the matching quality by measuring the difference between the simulated andobserved data. Manual process is usually a time consuming and tedious task. An interesting alternative is the assistedhistory matching (AHM) that consists in the automation of part of the process. However, AHM normally demandsmany simulations. Proxy models, generated from Artificial Neural Networks (ANN) for example, can be used in theprocess in order to reduce the number of simulations. Artificial Neural Networks are becoming increasingly popular inthe oil and gas industry. One of the fundamental aspects in the study of ANN is the definition of the training data set.This is a difficult task and is it strongly dependent on the problem. The difficult increases for problems with highnumber of variables and high nonlinearities. The main idea of present work is to analyze the influence of training dataset to generate proxy models to represent the simulator in the history matching process. Backpropagation multiple-layernetworks were trained using date set generated through Latin Hypercube and Box-Behnken design technique. Twodifferent size (number of points) of the data set was also tested and compared. The proxy models generated by the

trained networks were used in the optimization process with genetic algorithm (GA). The better solutions found by theGA were tested with the reservoir simulator to validate the results. A realistic reservoir model with eight producers andseven water injector wells and 16 uncertain parameters was used in this study.

7/29/2019 IBP3093_10.pdf


Rio Oil & Gas Expo and Conference 2010

2

1. Introduction

History matching process is very important in reservoir simulation area. It consists in the modification of someuncertain parameters of the reservoir, such as absolute and relative permeability, faults transmissibility, etc, in order toreproduce the past behavior of the reservoir. Calibrated model can be used to forecast future reservoir behavior what

make process crucial and extremely important in making decision, economic analysis, production strategy, etc.Some characteristics such as, inverse problem, nonlinearities, imprecision of the data, among others, make the

process difficult and time consuming. Some automated processes were proposed in the literature to mitigate theseproblems. Optimization techniques have been proposed to improve the process. Maschio and Schiozer (2004) tested theuse of optimization algorithm based in direct search method. Sousa et al. (2006) applied scatter search for historymatching.

Artificial Neural Networks (ANN) to generate proxy models is a promising tool because it’s capacity in pickup nonlinearities and potential to reduce the number of simulations. ANN’s try to imitate human brain behavior, wheremillions of neurons communicate with each other and make human brain learn behaviors through examples. It iscomposed by input layer where examples are presented to network, hidden layer where information is processed, andoutput layer where network show the results. In function approximation, input and desired output data set are presentedto network and network is then trained to represent same behavior of data set never seen.

There are many works about ANN in petroleum area. Al Thuwaini et al. (2006), used ANN’s to search for

similar regions in reservoir and cluster them; Ramgulam (2007) used ANN’s to find initial parameters to use inoptimization with simulator; Maschio et al. (2008) used ANN’s in history matching process, proving the capacity toreduce the number of simulations; Sampaio et al. (2009) did several tests varying some parameters to training networkand showed difficulty in training process. Zangl and Graf (2006) presented the usage of neural networks as proxy modelfor the production optimization process using genetic algorithm. Cullick, et al. (2006) generated nonlinear proxiesthrough neural networks and used them as auxiliary tool in the history matching process. The authors demonstrated thatthe neural network is an excellent proxy for the numerical simulator over the trained parameter space. Saemi, et al.(2007) designed neural networks using genetic algorithm for the permeability predicting problem from well logs. Theuse of neural networks combined with experimental design in the integration of history matching and uncertaintyanalysis was presented by Reis (2006). Silva et al. (2007) applied artificial neural networks, such as Radial BasisNetwork and Generalized Regression Neural Network, as proxies to reservoir simulator focused on the historymatching, using conventional training methods. Good approximation between the results obtained with the proxies andthe results obtained with the reservoir simulator were observed.

To train network correctly to represent desired behavior, some aspects have to be carefully studied, such asnumber of neurons, number of layers, training function, activation function, etc. These aspects are related to theconfiguration of the network. Other aspect, that is the focus of this work, is the training data set. It is important becausenetwork learn through examples, so training data have to cover efficiently the regions of search space; otherwisenetwork will not represent well regions of solution space that was not trained to represent.

To study the influence of training data set in ANN training process two types of sampling techniques wereused: Box Behnken (BB) and Latin Hypercube (LH). Box-Behnken is an experimental design for response surfacemethodology. It is an independent quadratic design where set of points lie at the midpoint of edges of amultidimensional cube and at the center (replicated center point). These design are rotatable (or near rotatable) andrequire 3 levels of each factor. More explanation can be found in Box and Behnken (1960).

Latin Hypercube is a sampling technique for which the probability distribution of one variable is divided inintervals and samples are drawn in each interval. Number of drawn value is proportional of its probability interval. Thistechnique guarantee that values are well distributed in parameters space, although for a few number of samples, is notsure that it takes maximum and minimum values. This technique was used in process of combining history matchingand uncertain analysis by Maschio et al. (2009) and Maschio et al. (2010).

In the present work, proxies generated from ANN’s were used in optimization with Genetic Algorithm to findregions of minimum and the results were then simulated to compare and validate the results.

2. Methodology

The main steps of the methodology are described in the following:

1) Definition of the training points: this step consists of sample the training data set using sampling techniques. In thepresent work, Latin Hypercube and Box-Behnken were used.

2) Generation of the target data: each point generated in the Step 1 corresponds to a reservoir model and, therefore,this step consists in the simulation of each model. After the simulation, the difference (D) between the observedand simulated data is calculated according to the following equation (Eq. 1).

7/29/2019 IBP3093_10.pdf



3

∑=

−=

N

1i

2isim

iobs )dd(D (1)

where N is the number of observed data of each data series (well water rate, for example), iobsd and i

simd are,

respectively, observed and simulated data.

3) ANN Training: The training data set generated in the Steps 1and 2 are used to train the ANNs. The ANN input dataare the difference between the observed and simulated data and, therefore, the output of the trained ANNs alsorepresents this difference. In this work, backpropagation multiple-layer networks were used. The structuregenerated with the trained ANN is being called here as proxy.

4) Analysis of the ANN performance: after the training process, cross plot between output simulator data and outputproxy data are built to verify the ANNs consistency. Two cross plot are considered. The first one correlates thepoints used in the training process and the second one correlates another data set (not pertaining to the training set).The objective of the second analysis is to verify the generalization capability of each trained ANN.

5) Optimization: the fundamental characteristic of the trained ANN is the generalization capability that characterizes

its applicability. The simple obtaining of a training ANN do not make sense. Therefore, the objective of this step isto test the ability of the ANNs in the optimization process. In this work, genetic algorithm was chosen asoptimization method to minimize the objective function, in this case, the output proxy data.

6) Analysis of the optimization results: After the optimization process, the better solutions (combination of theparameters that provide lower objective function) are tested in the reservoir simulator and the results are comparedto the observed data.

3. Application

3.1. Reservoir model

The methodology was applied in a synthetic reservoir with realistic characteristics, shown in Figure 1. The

model, discretized in a corner point mesh with 90×110×5 blocks, was generated throw geostatistics methods, and iscomposed by three facies characterized accordingly with three ranges (low, intermediate, high) of permeability. Theuncertain parameters were defined as: logarithmic multiplier of horizontal permeability, multiplier of porosity, ratiobetween vertical and horizontal permeability and exponent of Corey model for relative permeability for water. Thereservoir model also present four faults, as shown in Figure 1. The transmissibility of each fault was included as auncertain parameter, totalizing sixteen uncertain parameters (Table 1) that were modified in the process. A referencemodel (chosen within possible combination of sixteen variables) was simulated to generate a synthetic history of tenyears. The reservoir is drained by fifteen vertical wells (eight producers and seven water injectors).

Figure 1 - Reservoir model used in the study (horizontal permeability – md)

7/29/2019 IBP3093_10.pdf



4

Table 1 - Description for uncertain parameters

Attribute number Description Type Min. Máx.

1 to 3 Porosity (por) Multiplier 0.85 1.15

4 to 6 Horizontal permeability (kx) Log multiplier 0.75 1.17 to 9 Vertical permeability (kz) Percent of kx (%) 4 25

10 to 13 Fault transmissibility (F) Multiplier 0 1

14 to 16 Relative permeability (kr) Exponent of water phase(Corey model)

1 5

3.2. Training process

Two characteristics of the training data set was analyzed: (1) the kind of sampling technique and (2) thenumber of points. The sampling techniques were Latin Hypercube and Box-Behnken design. A comparison between sixpoints sampled by Box-Behnken and Latin Hypercube designs is shown in Table 2. Box-Behnken has a predefined

matrix of points that depends on the number of design variables. In this case (16 variables) the matrix is composed by396 points with minimum, maximum and mean parameter values. Therefore, Box-Behnken and Latin Hypercube arecompared with 396 points. Further, Latin Hypercube with 396 and 250 points is also compared. For each kind of analysis, 8 proxies (one for each producer well) are generated. Each proxy represents in the output the differencebetween the observed and simulated data (Eq. 1) for the water rate of each producer well.

Table 2 - Six samples of Box-Behnken and Latin Hypercube designs

por1 por2 por3 kx1 kx2 kx3 kz1 kz2 kz3 F1 F2 F3 F4 kr1 kr2 kr3

0.850 0.850 1.000 0.925 0.925 0.750 14.500 14.500 4.000 0.500 0.500 0.500 0.500 2.500 2.500 2.500

0.850 0.850 1.000 0.925 0.925 0.750 14.500 14.500 25.000 0.500 0.500 0.500 0.500 2.500 2.500 2.500

0.850 0.850 1.000 0.925 0.925 1.100 14.500 14.500 4.000 0.500 0.500 0.500 0.500 2.500 2.500 2.500

0.850 0.850 1.000 0.925 0.925 1.100 14.500 14.500 25.000 0.500 0.500 0.500 0.500 2.500 2.500 2.500

0.850 1.150 1.000 0.925 0.925 0.750 14.500 14.500 4.000 0.500 0.500 0.500 0.500 2.500 2.500 2.500

0.850 1.150 1.000 0.925 0.925 0.750 14.500 14.500 25.000 0.500 0.500 0.500 0.500 2.500 2.500 2.500

0.852 1.021 0.964 0.888 0.777 0.912 13.530 9.904 21.120 0.378 0.948 0.839 0.241 2.815 2.044 3.892

1.057 1.126 0.936 0.951 1.075 0.808 17.831 21.711 16.398 0.209 0.157 0.932 0.012 3.827 3.153 2.269

1.028 0.909 1.101 1.028 1.058 0.996 24.831 14.795 14.964 0.466 0.586 0.337 0.173 2.494 3.088 2.847

1.072 0.880 0.950 1.052 1.021 1.030 15.892 9.566 8.133 0.402 0.402 0.096 0.960 2.976 2.173 2.157

0.970 1.033 0.860 0.959 0.850 0.966 9.060 24.072 11.506 0.305 0.185 0.209 0.687 3.811 3.313 2.414

0.998 1.146 1.132 1.099 1.078 0.834 4.590 24.410 8.470 0.245 0.859 0.647 0.133 3.956 1.964 1.193

Box-Behnken

Latin Hypercube

3.3. Optimization with genetic algorithm

The objective function minimized in the optimization process is composed by the mean value of the outputreturned by proxies. The main GA control parameters were: maximum of 500 generations and 20 individual pergeneration, crossover fraction and mutation rate both equal to 0.5.

4. Results and Discussion

The analysis of the results is divided in two main steps. The first one is related to the cross plot analysis inorder to verify the consistency of the trained ANNs and the second one are related to the application of the proxy on theoptimization process using genetic algorithm.

The cross plot analysis is shown in Table 3, Figures 2, 3, 4 and 5. Table 3 and Figure 2 show that correlation

coefficients for all wells, taking into account the test data, are greater than 70%, which represents a good capability of generalization. In the cross plot, blue points represent the correlation between simulator output data and proxy outputdata related to the points used in the training process and red points represent the same correlation for test data. For

7/29/2019 IBP3093_10.pdf



5

question of convenience, LH points were used to test the ANN trained by BB points and vice-versa. However, any dataset, respecting the range of each parameter, could be used. Taking into account the influence of the sampling technique,Latin Hypercube results are slightly better. However, analyzing only this results it is not possible to clearly define thebetter technique.

The comparison of the simulation models with the history data for four wells are shown in Figure 6 (PROD2

and 3) and Figure 7 (PROD5 and 8), which present the water rate for the Base case as well as for the models resultedfrom the optimization processes (see Step 6 of the methodology). The simulation results obtained with the LH 396(Latin Hypercube with 396 points) was slightly better than BB 396 (Box-Behnken with 396 points) for wells PROD2, 5and 8. Analyzing the influence of the number of points of the training data set, it can be observed that for three wells theresults obtained with the LH 396 was better than LH 250 (Latin Hypercube with 250 points).

The objective of this paper is not to show a perfect match between the simulated and observed data. Toimprove the results of the history matching, more detailed study of the optimization process can be done in futureworks. Another possible approach is the use of the better solutions, obtained from the optimization with proxies, asinitial guess to an optimization process using the simulator, increasing the probability of finding optimum solutionthrough a gradient based method (that normally has a high convergence rate) with less simulation.

Table 3 - Correlation coefficient calculated for each cross plot analysis

ANN Data type PROD1 PROD2 PROD3 PROD4 PROD5 PROD6 PROD7 PROD8Test data(LH396)

0.8285 0.9651 0.7729 0.7319 0.9044 0.8013 0.7376 0.8425ANNBB396 Proxy data

(BB396)0.9930 0.9949 0.9969 0.9986 0.9824 0.9865 0.9961 0.9974

Test data(BB396)

0.9010 0.9586 0.8335 0.8583 0.8714 0.7820 0.7139 0.8068ANN

LH396 Proxy data(LH396)

0.9873 0.9955 0.9896 0.9821 0.9838 0.9877 0.9651 0.9877

Test data(BB396)

0.8966 0.9711 0.8273 0.8631 0.8521 0.8174 0.6772 0.8302ANN

LH250 Proxy data

(LH250)0.9792 0.9979 0.9888 0.9891 0.9681 0.9804 0.9929 0.9942

¡

¡ ¢

¡

£

¡ ¤

¡ ¥

¦ ¡

¦ ¡ ¢

CorrelationCoef f icient

Test data set Train data set

Correlation Coefficient

§ ¨ ̈

¤

§ ¨ ̈

¤

§ ¨ ̈

¢

Figure 2 - Comparison of the correlation coefficients

7/29/2019 IBP3093_10.pdf



6

0 2 4 6 8 10

x 107

0

2

4

6

8

10

x 107

Simulator output data

P r o x y

o u t p u t d

a t a

Trained ANN with 396 points of BB data set - PROD2

Test data - LH 396 points

Train data - BB 396 points

0 2 4 6 8 10

x 107

0

2

4

6

8

10

x 107


P r o x y

o u t p u t d a t a

Trained ANN with 396 points of LH data set - PROD2

Test data - BB 396 points

Train data - LH 396 points

Figure 3 - Comparison of the simulator output data and proxy output data for well PROD 2. In blue are trained

data points and in red are the test data points

0 1 2 3 4 5

x 107

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

7


P r o x y

o u t p u t d a t a

Trained ANN with 396 points of BB data set - PROD 3

Test data - LH 396 points

Train data - BB 396 points

0 1 2 3 4

x 107

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

7


P r o x y

o u t p u t d a t a




Figure 4 - Comparison of the simulator output data and proxy output data for well PROD 3

0 2 4 6 8 10

x 107

0

2

4

6

8

10

x 107


P r o x y

o u t p u t d a t a




0 1 2 3 4

x 107

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

7


P r o x y

o u t p u t d a t a




Figure 5 - Comparison of the simulator output data and proxy output data for wells PROD 2 and PROD 3

7/29/2019 IBP3093_10.pdf



7

¡

¢

£

¤

¥

¥ ¡

¥ ¡ ¦ ¢

W a t e r r a t e S C ( m 3 / d a y )

Time (days)

PROD2

History

Base case

BB 396

LH 396

LH 250

§

¨ § §

©

§ §

§ §

§ §

§ § §

¨ § §

©

§ §

§

§ § § ¨ § § § § § §

W a t e r r a t e S C ( m 3 / d

a y )

Time (days)

PROD3

History

Base case

BB 396

LH 396

LH 250

Figure 6 - Comparison of the history and simulation results (water rate) for PROD2 and PROD3 (Step 6 of the

methodology)


Time (days)

PROD5

History

Base case

BB 396

LH 396

LH 250

!

" !

# ! !

# " !

$ ! !

! # ! ! ! $ ! ! ! % ! ! !


Time (days)

PROD8

History

Base case

BB 396

LH 396

LH 250

Figure 7 - Comparison of the history and simulation results (water rate) for PROD5 and PROD8 (Step 6 of themethodology)

5. Conclusions

The influence of the training data set in the quality of the output proxies generated by artificial neural networkswas studied in this work. Different sampling techniques to generate training data set and different number of pointswere tested. Proxies generated from Latin Hypercube data set with 396 points provided better results compared to theother configurations (Latin Hypercube data set with 250 points and Box-Behnken with 396 points) for the case studied.

The results showed good estimations, however, proxies can not substitute the reservoir simulator in the overall processProxies generated from well trained ANN has good generalization capability and can be used as support tool to thereservoir simulator in the assisted history matching process.

6. Acknowledgments

The authors would like to thank the financial support from Petrobras and UNISIM, DEP and CEPETRO forsupporting this research.

7/29/2019 IBP3093_10.pdf



8

7. References

AL-THUWAINI, J. S., ZANGL, G., PHELPS, R., “Innovative Approach Assit History Matching Using ArtificialItelligency”. In: SPE Intelligent Energy Conference Exhibition, Amsterdam, Netherlands: April, 2006.

BOX, G. E. P., BEHNKEN, D. W., “Some New Three Level Design for the Study of Quantitative Variables”.Technometrics, Vol. 2, No. 4, pp. 455-475, nov., 1960.

CULLICK, A. S; JOHNSON, D.; SHI, G. “Improved and more-rapid history matching with a nonlinear proxy andglobal optimization”. Annual Technical Conference and Exhibition, San Antonio, Texas, 24-27 September, 2006.

MASCHIO, C., CARVALHO. C. P. V., SCHIOZER, D. J. “Aplicação da Técnica do Hipercubo Latino na Integraçãodo Ajuste Histórico com Análise de Incerteza”. In: 5º Congresso Brasileiro de Pesquisa e Desenvolvimento emPetróleo e Gás, Fortaleza-CE, Brazil: oct., 2009.

MASCHIO, C., CARVALHO. C. P. V., SCHIOZER, D. J. “A New Methodology to Reduce Uncertanties in ReservoirSimulation Models Using Observed Data and Sampling Techniques”. Journal of Petroleum Science andEngineering, Vol. 72, Issues 1-2, pp. 110-119, May 2010.

MASCHIO, C., NAKAJIMA, L., SCHIOZER, D. J., “Uso de Redes Neurais no Processo de Ajuste de Histórico de

Produção”. In: Rio Oil & Gas Expo and Conference, Rio de Janeiro-RJ, Brazil: Sept., 2008.MASCHIO, C., SCHIOZER, D. J., “Ajuste de Histórico Assistido Usando Métodos de Otimização de Busca Direta”.

In: Rio Oil & Gas Expo and Conference, Rio de Janeiro-RJ, Brazil: Oct., 2004.

RAMGULAM, A., ERTEKIN, T., FLEMINGS, P. B., “Utilization of Artificial Neural Networks in the Optimization of History Matching”. In: Latin American and Caribbean Petroleum Engineering Conference, Buenos Aires,Argentina: April, 2007.

REIS, L. C. “Risk analysis with history matching using experimental design or artificial neural networks”. SPEEuropec/EAGE Annual Conference and Exhibition, 12-15 June, Vienna, Austria, 2006.

SAEMI, M.; AHMADI, M.; VARJANI, A. Y. “Design of neural networks using genetic algorithm for the permeabilityestimation of the reservoir”. Journal of Petroleum Science and Engineering, 59, 97–105, 2007.

SAMPAIO, T. P., FERREIRA FILHO, V. J. M., DE SA NETO, A., “An Application of Feed Forward Neural Networkas Nonlinear Proxies for the Use During the History Matching Phase”. In: Latin American and Caribbean PetroleumEngineering Conference, Cartagena, Colombia: 31 May-3 June, 2009.

SILVA, P. C; MASCHIO, C.; SCHIOZER, D. J. “Use of neuro-simulation techniques as proxies to reservoir simulator:Application in production history matching”. Journal of Petroleum Science and Engineering, Vol. 57, pp. 273-280,2007.

SOUSA,. S. H. G., MASCHIO, C., SCHIOZER, D. J., “Applying the Scatter Search Meta-heuristic to the HistoryMatching Problem”. In: Rio Oil & Gas Expo and Conference, Rio de Janeiro-RJ, Brazil: Sept., 2006.

ZANGL, G. AND GRAF, T. “Proxy Modeling in Production Optimization”, SPE Europec/EAGE Annual Conferenceand Exhibition, 12-15 June, Vienna, Austria, 2006.

IBP3093_10.pdf

Documents