Top Banner
University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2016 Well Production Prediction and Visualization Using Data Mining and Web GIS Wei, Bingjie Wei, B. (2016). Well Production Prediction and Visualization Using Data Mining and Web GIS (Unpublished master's thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/28686 http://hdl.handle.net/11023/3088 master thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca
88

Well Production Prediction and Visualization Using Data ...

Oct 24, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Well Production Prediction and Visualization Using Data ...

University of Calgary

PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2016

Well Production Prediction and Visualization Using

Data Mining and Web GIS

Wei, Bingjie

Wei, B. (2016). Well Production Prediction and Visualization Using Data Mining and Web GIS

(Unpublished master's thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/28686

http://hdl.handle.net/11023/3088

master thesis

University of Calgary graduate students retain copyright ownership and moral rights for their

thesis. You may use this material in any way that is permitted by the Copyright Act or through

licensing that has been assigned to the document. For uses that are not allowable under

copyright legislation or licensing, you are required to seek permission.

Downloaded from PRISM: https://prism.ucalgary.ca

Page 2: Well Production Prediction and Visualization Using Data ...

UNIVERSITY OF CALGARY

Well Production Prediction and Visualization Using Data Mining and Web GIS

by

Bingjie Wei

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF MASTER OF SCIENCE

GRADUATE PROGRAM IN GEOMATICS ENGINEERING

CALGARY, ALBERTA

JUNE, 2016

© Bingjie Wei 2016

Page 3: Well Production Prediction and Visualization Using Data ...

ii

Abstract

Massive data sets have been accumulated in the oil and gas industry. As strategic assets,

voluminous data of different data types should be leveraged and turned into information for agile

and accurate decision-making. Three oil and gas data-related studies are covered in this thesis.

Firstly, a data-driven model is proposed for predicting well production using time-series

production data from analogous and adjacent wells. Secondly, interactive visualization tools are

designed and implemented for oil and gas spatial and temporal datasets, following an “Overview

first, zoom and filter, then details-on-demand” guideline (Shneiderman, 1996) in order to

maximize information delivery in single displays. Thirdly, a web-based Geographic Information

System (GIS) application is designed and implemented for a Steam Assisted Gravity Drainage

(SAGD) dataset to provide users convenient access to public and proprietary SAGD data, as well

as some data analysis and visualization functions.

Page 4: Well Production Prediction and Visualization Using Data ...

iii

Acknowledgements

I wish to thank my supervisor, Dr. Xin Wang, for the encouragement, guidance, inspiration and

patience she has provided me throughout my master’s program. I am grateful for the research

opportunities Dr. Wang provided me to work with our industrial partners, Divestco and

Schlumberger. Also, thanks to the Natural Sciences and Engineering Research Council of Canada

(NSERC) for funding the researches.

I would also like to thank several of my colleagues, Rodrigo Silva, Xiaodong Sun, Ge Cui

and Yuanchen Li, for their support and insights. I greatly appreciate the guidance and assistance

from Helen Pinto. Thanks to Helen for the input and sharing of her knowledge in the oil and gas

industry. I am grateful for the precious companionship and encouragement from all my colleagues

and friends inside and outside of school.

I am so thankful for my parents, Zichang Wei and Xiulin Tong, for their unconditional

love, endless patience and incredible support. They have always encouraged and supported my

pursuit of higher education, and taught me to keep a positive perspective on life. I am blessed to

be their daughter.

Page 5: Well Production Prediction and Visualization Using Data ...

iv

Table of Contents

Abstract ............................................................................................................................... ii

Acknowledgements ............................................................................................................ iii

Table of Contents ............................................................................................................... iv

List of Tables ..................................................................................................................... vi

List of Figures ................................................................................................................... vii

List of Symbols, Abbreviations and Nomenclature ........................................................... ix

CHAPTER ONE: INTRODUCTION ..................................................................................1

1.1 Background ................................................................................................................1

1.2 Problem statement ......................................................................................................2

1.2.1 Oil and gas production prediction .....................................................................3

1.2.2 Interactive data visualization for temporal and spatial oil and gas data ............4

1.2.3 Web GIS system design for oil and gas datasets ...............................................5

1.3 Research objectives ....................................................................................................6

1.4 Research contributions ...............................................................................................6

1.5 Thesis outline .............................................................................................................7

CHAPTER TWO: RELATED WORK ................................................................................9

2.1 Time-series data approximation and symbolization ..................................................9

2.2 Data mining methods ...............................................................................................11

2.2.1 Cluster analysis ................................................................................................11

2.2.2 Association Rule Mining .................................................................................11

2.2.3 Decision tree induction ....................................................................................12

2.3 Oil and gas data visualization ..................................................................................13

2.4 GIS applications in oil and gas ................................................................................14

CHAPTER THREE: A SYMBOLIC TREE MODEL FOR OIL AND GAS PRODUCTION

PREDICTION USING TIME-SERIES PRODUCTION DATA .............................16

3.1 Introduction ..............................................................................................................16

3.2 Symbolic tree construction for well production prediction .....................................17

3.2.1 Time-series well production data preprocessing .............................................18

3.2.2 Symbolic tree construction and evaluation ......................................................21

3.2.2.1 Spatial information gain calculation for symbolic tree nodes ...............24

3.2.2.2 Coverage index for evaluating symbolic trees .......................................26

3.2.3 New well production prediction ......................................................................27

3.2.4 Symbolic Tree Visualization ...........................................................................28

3.3 Case study: Canadian shale gas production prediction ............................................28

CHAPTER FOUR: DATA VISUALIZATION TOOL DESIGNS FOR SPATIAL WELLS

AND TIME-SERIES OIL AND GAS DATA ..........................................................36

4.1 Introduction ..............................................................................................................36

4.2 Oil and gas time-series data characteristics and visualization tasks ........................36

4.2.1 Oil and gas time-series data characteristics .....................................................36

4.2.2 Oil and gas data visualization tasks .................................................................38

4.3 Web GIS interface and visualization and interaction controls ................................39

Page 6: Well Production Prediction and Visualization Using Data ...

v

4.3.1 Web GIS interface ...........................................................................................40

4.3.2 Injection and production data visualization template ......................................41

4.3.3 Bubble map for visualizing production data of multiple wells .......................43

4.3.4 Completion data visualization template ..........................................................45

4.3.5 Well status data visualization template ...........................................................46

CHAPTER FIVE: A WEB-BASED STEAM ASSISTED GRAVITY DRAINAGE DATA

VISUALIZATION AND ANALYTICAL SYSTEM ..............................................48

5.1 Introduction ..............................................................................................................48

5.2 Design of the SAGD data visualization and analysis system ..................................49

5.2.1 System design ..................................................................................................50

5.2.2 SAGD database ...............................................................................................51

5.2.2.1 Database structure ..................................................................................51

5.2.2.2 Data collection and preprocessing .........................................................53

5.3 The Web GIS user interface for SAGD ...................................................................54

5.4 The data visualization and data mining user interface .............................................56

5.4.1 Data visualization ............................................................................................56

5.4.1.1 Data visualization templates ..................................................................56

5.4.1.2 Case study on SAGD injection and production history .........................60

5.4.2 Data mining .....................................................................................................63

CHAPTER SIX: CONCLUSIONS AND FUTURE WORK ............................................68

6.1 Conclusions ..............................................................................................................68

6.2 Future work ..............................................................................................................70

APPENDIX: PUBLICATION DURING THE PROGRAM .............................................72

REFERENCES ..................................................................................................................73

Page 7: Well Production Prediction and Visualization Using Data ...

vi

List of Tables

Table 3- 1 Symbols and ranges for time series symbolization of 3 symbols ................................ 30

Table 3- 2 First 12 month monthly well production and symbols for two example wells ........... 30

Table 3- 3 Sensitivity, specificity and coverage calculation for the four symbolic tree

predictive models using different symbol sizes for time-series data symbolization ............. 35

Table 4- 1 Data characteristics of four types of oil and gas time-series data ............................... 38

Page 8: Well Production Prediction and Visualization Using Data ...

vii

List of Figures

Figure 3- 1 Production data of a new well relative to existing adjacent wells ............................. 16

Figure 3- 2 Flowchart of proposed Symbolic Tree Model ........................................................... 18

Figure 3- 3 Production data aggregation and symbolization (a) Aggregated and symbolized

time-series production data of one example well (b) Data distribution of all aggregated

production data ...................................................................................................................... 20

Figure 3- 4 Basic algorithm for constructing a symbolic tree from well production data ............ 23

Figure 3- 5 Monthly well production and symbol sequences for two example wells .................. 30

Figure 3- 6 Symbolic tree built from symbolic gas production time series using symbol size

of 3 ........................................................................................................................................ 32

Figure 3- 7 Production data preprocessing, symbolic tree construction and well production

prediction .............................................................................................................................. 33

Figure 4- 1 Mapping interface with extended attribute table and information window ............... 41

Figure 4- 2 Interactive chart for viewing production data ............................................................ 42

Figure 4- 3 Bubble map for shale gas wells .................................................................................. 44

Figure 4- 4 Bubble map for shale gas wells with detailed information expended on one

specific well .......................................................................................................................... 44

Figure 4- 5 Bar chart for viewing shale gas completion data ....................................................... 46

Figure 4- 6 Timeline chart for viewing well status data ............................................................... 47

Figure 5- 1 System design of the proposed web-based GIS for a SAGD dataset ......................... 50

Figure 5- 2 SAGD database structure ........................................................................................... 52

Figure 5- 3 The Web GIS user interface of the system ................................................................. 54

Figure 5- 4 Data visualization templates (a) Timeline of a producer well (UWI 02/08-11-095-

06W4/0) in Suncor Firebag (b) Time-series visualization for the producer well in Suncor

Firebag .................................................................................................................................. 58

Figure 5- 4 Data visualization templates (c) Bar chart of well total depth for Suncor Firebag

and Husky Tucker wells (d) Pie chart of status for Suncor Firebag wells ............................ 59

Figure 5- 5 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-

095-06W4/0; producer UWI 02/08-11-095-06W4/0) (a) Injection steam history in a bar

chart (b) Produced water, oil and gas in a line chart ............................................................. 61

Page 9: Well Production Prediction and Visualization Using Data ...

viii

Figure 5- 5 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-

095-06W4/0; producer UWI 02/08-11-095-06W4/0) (c) Injected steam and produced oil

in a line chart (d) CSOR and SOR in a line chart ................................................................. 62

Figure 5- 6 Examples of categorical, numerical classification and k-means clustering of wells

in Suncor Firebag project (a) Categorical classification of well pads (b) Numerical

classification using quantile of well average oil production ................................................. 64

Figure 5- 6 Examples of categorical, numerical classification and k-means clustering of wells

in Suncor Firebag project (c) K-means clustering of SOR and oil production ..................... 65

Figure 5- 7 ARM example for Suncor Firebag project (a) ARM result map legend (b) Map

interface of association rule 1 ............................................................................................... 66

Figure 5- 7 ARM example for Suncor Firebag project (c) Map interface of wells fully

satisfying association rule 3 .................................................................................................. 67

Page 10: Well Production Prediction and Visualization Using Data ...

ix

List of Symbols, Abbreviations and Nomenclature

Symbol Definition

Coverage Symbolic tree coverage index

𝑑𝑖𝑖𝑛𝑡 Internal distance

𝑑𝑖𝑒𝑥𝑡 External distance

𝐻 Spatial entropy

𝑚 Aggregation level

𝑛 Original production data length

𝑤 Production symbol sequence length

Abbreviation Definition

ARM Association Rule Mining

DTF Discrete Fourier Transform

DWT Discrete Wavelets Transform

GIS Geographic Information System

PAA Piecewise Aggregation Approximation

SAGD Steam Assisted Gravity Drainage

SAX Symbolic Aggregation Approximation

SVD Singular Value Decomposition

Page 11: Well Production Prediction and Visualization Using Data ...

1

CHAPTER ONE: INTRODUCTION

1.1 Background

Oil and gas industry has long been dealing with massive data generated during hydrocarbon

exploration, development and production. Acquisition and interpretation of seismic data, followed

by drilling and confirmation of existing oil or gas reserves, can cost millions of dollars. Even then,

it is possible to drill dry holes. In order to make a profit, the accumulative expense of exploration,

well development and production must be minimized to reduce costs. Improving production with

advanced technologies and optimized management and human capital can reduce operating costs

to a certain extent. The most efficient way to reduce costs is to facilitate agile and accurate decision

making to prevent projects from lagging while millions of dollars being wasted. In order to

facilitate fast and accurate decision-making, it is critical to derive information and draw insights

from data.

Data analytics, data visualization and interpretation, and data integration and management

make up the most essential processes to obtain useful information from various oil and gas related

data sources.

The industry is slowly adapting to utilizing data driven models for data analytics in some

specific petroleum applications, such as clustering in seismic attribute analysis (Strecker & Uden,

2002; Marroquin et al., 2009), Artificial Neural Network in reservoir characterization

(Mohaghegh, 2000), Neural Networks and Support Vector Machine in drilling and completion

optimization (Serapiao et al., 2006; Fruhwirth et al., 2006) and so on. Data mining is to reveal

hidden patterns and relationships embedded in big datasets. Besides the datasets used in the

aforementioned studies (i.e. seismic data, reservoir geological and geophysical data, and drilling

data), oil and gas well production data are essentially voluminous. Production data are time-series

Page 12: Well Production Prediction and Visualization Using Data ...

2

data, including the regularly updated volumes of multiple substances. Production engineers refer

to production data for decision-making concerning production operations.

Data visualization tools communicate information in oil and gas data to engineers and other

petroleum professionals by visual displays. Three-dimensional seismic images provide

information through the subsurface so that geologists can make observations for hydrocarbon

explorations. Updates on different drilling parameters are displayed in overlaid line charts for

drilling engineers to detect underground drilling conditions and progresses. Basic line charts are

used to display time-series production data and injection data providing information on oil and gas

injection and production operations.

Besides temporal data, spatial data are another important part in oil and gas datasets. Oil

and gas wells, pipeline, and environmental sites and facilities are geospatial objects. Geographic

Information System (GIS) technology has been employed in some data management commercial

software, such as Accumap by IHS, GeoCarta by Divestco, and geoSCOUT by geoLOGIS, to

store, map and search/filter these spatial objects. In addition, they provide access to integrated

public and proprietary data in the geophysical, geological and engineering disciplines for user

selected geospatial wells, pipelines or lands. Applications of GIS in the oil and gas industry have

facilitated efficient data management and timely information retrieval.

1.2 Problem statement

The existing researches and applications have laid the groundwork of oil and gas data mining/data

analysis, data visualization, and data management. There are three problems identified respectively

for the three oil and gas data related topics in this thesis: oil and gas production prediction,

interactive visualization tools for spatial and temporal oil and gas data, and Web GIS platform

design.

Page 13: Well Production Prediction and Visualization Using Data ...

3

1.2.1 Oil and gas production prediction

Hydrocarbon exploration and production is a worldwide industry that is technically challenging

and high-risk in terms of profitability. Well performance prediction starts in the early stages of

production to estimate future recovery, because it is critical for profitmaking. Wells can be

remediated or even shut-in to prevent further loss if their predicted production does not reach a

determined level.

Comparing early stage production data of a new well with historical production data of

adjacent wells is common practice for future production prediction. Type curve matching is an

industry-recognized approach for predicting cumulative production. Analogous wells are first

grouped by the similarity of their cumulative production curves, and the average production

profiles are calculated as the type curves for each group of wells (Mohaghegh & Gaskari, 2009;

Sproule, 2015). The early cumulative production of a new well is then matched to the different

type curves, and the closest curve will be picked by engineers to estimate the future well

performance for the new well. Type curve matching is efficient but subjective. It is difficult to

repeat the analysis because different engineers might provide different estimations for a particular

well based on their experience in visual interpretations of the type curves. A second issue is that

cumulative production curves tend to look quite similar to each other over time, making it

challenging to distinguish performance variations in individual wells.

Research literature shows that data mining techniques are gaining popularity in the oil

industry for production data analysis because the analysis techniques are objective, resistant to

poor data quality, and more accurate than statistical approaches (Mohaghegh et al., 2008; Ma et

al., 2015; Zhong et al. 2015). Different data mining algorithms such as Support Vector Machine,

Random Forests and Boosted Regression Trees have proved to be efficient and effective tools in

Page 14: Well Production Prediction and Visualization Using Data ...

4

understanding oil and gas operation and production (LaFollette et al. 2012; Esmaili et al. 2013;

Zhong et al. 2015). However, the time-series production data in these studies were either averaged,

or represented by their maximum or cumulative values. These statistical measures oversimplify

the production curves and do not capture the entire production trend of each well. Therefore, an

effective and intuitive data driven prediction approach that uses time-series production data

without oversimplification is to be studied.

1.2.2 Interactive data visualization for temporal and spatial oil and gas data

Along with enormous data and various data analytical tasks in the oil and gas industry, data

visualization has been a useful and popular tool. Users can percept changes in movement, shape,

size, color and texture in an image and further interpret the pieces of information based on their

visual perception (Shneiderman, 1996). However, even though humans have remarkable cognitive

abilities, it would be hard for users to digest all the information from a big and complex dataset.

Especially when the data are multidimensional or multivariate, all aspects of a complex dataset

encoded in a single visual display would make an overwhelming and confusing representation.

Oil and gas time-series datasets are certainly complex taking into account the data types

and sizes. Common oil and gas data include well locations, well affiliation, well status, production

data and completion data. Well locations reflect the geographical coordinates of the surface and

bottom holes of the well, and the field, pool and formation the well belongs to. Well affiliation

data indicate the operator and licensee of the well. Oil and gas wells undergo different stages (e.g.

drilled, active, suspended, abandoned, and whipstocked), and their statuses are updated

accordingly. Production and injection data are updated regularly on multiple substances including

oil, gas, water and other fluid. Well completion processes are taken place to make a well ready for

production. It involves preparing the bottom hole and production tubing to different reservoir

Page 15: Well Production Prediction and Visualization Using Data ...

5

conditions and specifications, and perforating and stimulating if required for wells to achieve

maximum reservoir contact.

In order to visually represent the different types of oil and gas data, especially the spatial

wells with their attributive data and the time-series data, dividing visualization problems of the

complex dataset into separate facets and building different views that focus on particular

visualization problems would lead to a reasonable and interpretable visualization. Moreover, in

terms of combining the different visual representations for users to mentally interpret the dataset

as a whole picture, interaction is the key. Interactive visualization tools can accommodate both

production trend lines and distribution of gas, oil and water productions over the time and in

different areas. Therefore, in order to deliver rich information intuitively to users, interactive

visualization tools are in need for the oil and gas datasets.

1.2.3 Web GIS system design for oil and gas datasets

GIS technology is commonly used in the industry for mapping and data management purposes.

The aforementioned commercial software products are all built on GIS platforms. However, they

not only have high hardware configuration requirements for installation, but also have sequential

packages to be installed as software modules or database updated. Therefore, it is inconvenient for

users to access the latest software and data, and thus it is a relatively inefficient data distribution

approach.

Web GIS systems, GIS systems built with web technologies, give users access to the

system and the mapping and analytic functionalities as long as they have access to the Internet,

and are approachable by broad audience simultaneously through web browsers. Therefore, the

feasibility of employing a Web GIS platform to map oil and gas related spatial objects and deliver

requested data and information is to be studied. Moreover, integration of the interactive data

Page 16: Well Production Prediction and Visualization Using Data ...

6

visualization tools and some data mining analytics within the Web GIS system to empower data

analytical functions is to be designed and developed.

1.3 Research objectives

The thesis mainly focuses on data mining, data visualization and data management using GIS on

oil and gas datasets. With respects to the three oil and gas data centered topics, three research

objectives are identified:

1) A data driven approach is to be designed and implemented to predict well production. The

approach relies on production data of the analogous wells to estimate performance of target

wells. Time-series well production data are used for prediction instead of the statistical

measurements calculated from the time-series sequences;

2) The visual and interactive user interfaces and controls are to be designed and implemented for

oil and gas wells, and oil and gas time-series data, including injection, production, completion

and status data;

3) A Web GIS system prototype is to be designed and implemented for mapping oil and gas wells,

providing access to their proprietary data, visualizing user-selected data and providing

analytical functions.

1.4 Research contributions

The contributions of this thesis are listed as follows:

1) The proposed approach for production prediction consists of time-series aggregation and

symbolization steps to reduce dimensionality of the time-series production data and further

transform numerical values to categorical ones. A symbolic tree model with pre-pruning

mechanisms is used to build a predictive model from multiple well production histories, as

Page 17: Well Production Prediction and Visualization Using Data ...

7

well as a novel tree index is proposed to evaluate the symbolic tree in terms of the tree size.

We conducted an experiment on a well production dataset of shale gas wells in Montney-A

pool in British Columbia and Alberta, Canada to demonstrate the feasibility of the proposed

method;

2) Data characteristics on the different time-series data and the respective user tasks at hand are

analyzed for designing the visual and interactive user interfaces and controls. Following the

classic Visual Information-Seeking Mantra proposed by Shneiderman (1996), templates are

designed for viewing oil and gas time-series data – injection, production, completion and status

data. Additionally, a mapping interface has been designed for displaying spatial wells and

integrating the time-series data viewers. The proposed interactive data visualization templates

are implemented and tested on the Montney shale gas wells;

3) A Web GIS system has been designed and implemented with a Steam Assisted Gravity

Drainage (SAGD) dataset retrieved for some Canadian SAGD projects. Besides mapping

SAGD wells, some data visualization and data analysis functions including clustering and

Association Rule Mining (ARM) in the web system provide further information for users,

proving the Web GIS system an efficient and practical application.

1.5 Thesis outline

Chapter Two introduces the related work on time-series data approximation and symbolization,

data visualization applications of oil and gas data, Web GIS applications, as well as data mining

models including decision tree induction, K-means clustering and ARM. Chapter Three introduces

the proposed well production prediction approach and evaluates the approach with the experiment

on a shale gas production dataset. Chapter Fours analyzes the data characteristics of oil and gas

Page 18: Well Production Prediction and Visualization Using Data ...

8

time-series data and the visualization tasks, and demonstrates the designed interactive visualization

tools. Chapter Five describes the Web GIS system with data visualization and analytical functions

implemented with a SAGD dataset. In the last chapter, Chapter Six, conclusions are drawn and

further works for this thesis are stated.

Page 19: Well Production Prediction and Visualization Using Data ...

9

CHAPTER TWO: RELATED WORK

The following sections respectively introduce time-series data approximation and symbolization

methods, data mining models including K-means clustering, ARM and decision tree induction,

data visualization applications and Web GIS applications of oil and gas data.

2.1 Time-series data approximation and symbolization

Oil and gas production data, which are the data resources for production prediction, are time-series

data. The large volume is the barrier for the majority of the time-series data to fit in main computer

memory and normal data mining computing processes, so most time-series mining workflows start

with an approximation step to acquire fewer parameters or time-series data points to approximately

represent the time-series for computing processes afterwards.

Different techniques have been proposed, including major methods like Discrete Fourier

Transform (DTF) (Agrawal et al., 1994), Discrete Wavelets Transform (DWT) (Chan & Fu, 1999),

and Singular Value Decomposition (SVD) (Korn et al., 1997). Since time-series data is high

dimensional, dimensionality reduction techniques are used to reduce the number of descriptive

parameters or data points. Discrete Fourier Transform (DTF) converts a time series into a finite

combination of sine (and/or cosine) waves, isolating the fundamental frequencies present (Agrawal

et al., 1994). Discrete Wavelets Transform (DWT) uses the sum and difference of a mathematical

function localized in discrete periods of time (Chan & Fu, 1999). Singular Value Decomposition

(SVD) globally transforms multiple time-series in the same dataset to a number of eigenvalues

instead of focusing on transforming each time-series (Korn et al., 1997).

Piecewise Aggregation Approximation (PAA) represents time series by the mean values

of equi-length segments of the original sequences (Keogh et al., 2001). It is a more intuitive

dimensionality reduction technique that works with a variety of distance measures, allowing fast

Page 20: Well Production Prediction and Visualization Using Data ...

10

indexing and querying. The superior performance of PAA has been theoretically proven and

empirically demonstrated by Keogh et al. (2001).

Having introduced the four time-series data approximation techniques for reducing

dimensionality, we will now compare. To reduce data of 𝑛 dimensions, the time complexity of

DFT is O(𝑛2); the time complexity for a DWT computation is O(𝑛); SVD requires O(𝑚𝑛2) time;

and the PAA method requires O(𝑛𝑘) time while there are 𝑘 numbers of equal-length segments.

Therefore, PAA and DWT methods are relatively lower in time complexity. In terms of measuring

distances between different time-series sequences, DFT, DWT and SVD are capable of

approximating only Euclidean distance, while PAA can handle different distance metrics. One

drawback of SVD is that it requires multiple time-series for calculating eigenvalues.

A time series can be converted into symbolic representation by dividing the data into

groups, then summarizing or averaging each group, and assigning a symbol to it. This transforms

the time series into a sequence of symbols. Symbolic representations can effectively preserve

essential data features; are robust in handling data noise; can be used in various data structures;

and generally improve numerical computation speed (Daw et al., 2003). Lin et al. (2007)

developed a technique called Symbolic Aggregation Approximation (SAX) that builds on PAA by

symbolizing the mean values of the time-series segments. A unique feature for SAX is that by

applying PAA and symbolization with equiprobability, it guarantees that the distance measure

between two time-series sequences calculated by SAX lower bounds the true distance (Lin et al.,

2007). SAX has been widely employed in different studies to process real-world sensor data. For

instance, SAX was used to convert sensor data from a Wireless Sensor Network to strings, enabling

the detection of interesting or unusual events in the monitored process (Zoumboulaskis & Roussos,

2011). Siirtola et al. (2011) extracted similarity features from SAX-symbolized time-series data,

Page 21: Well Production Prediction and Visualization Using Data ...

11

and integrated them with traditional statistical features to improve classification accuracy of

streaming data.

2.2 Data mining methods

Data mining is the process to discover patterns and knowledge from large datasets (Han et al.,

2001). Frequent patterns, associations and correlations mining, classification and clustering are the

most common data mining tasks. Cluster analysis, ARM and decision tree induction are introduced

below as well as three classic algorithms.

2.2.1 Cluster analysis

Cluster analysis, or called clustering, aims at grouping objects with similar properties and also

partition objects with dissimilarity (Han et al., 2001). The consistency of the clustering result of

geological properties and oil and gas resources can assist in oil and gas resource exploration and

evaluation (Liu & Xue, 2008).

K-means is one of the most popular clustering methods. K is a user-defined variable that

stands for the number of clusters or groups. The algorithm initializes k random objects representing

the cluster centroids and iterates the process of assigning other objects to centroids with the closest

distances and calculating new centroids until there is no change in all the clusters. K-means

clustering algorithm can efficiently process large datasets due to its relatively low computation

complexity.

2.2.2 Association Rule Mining

ARM is used to find frequent associations and correlations among different attributes from large

datasets. In gas and oil research field, ARM has been used in reservoir analysis and oil production

(Aulia et al., 2010; Cai et al., 2014). An association rule is comprised of an antecedent part (IF)

and a consequent (THEN) part. Two measures, support and confidence, are used to define rule

Page 22: Well Production Prediction and Visualization Using Data ...

12

interestingness. An example rule in the paper of Cai et al. (2014) can be described: IF three

reservoir properties match certain levels, THEN the well oil production is high (support = 5.1%,

confidence = 85.7%). Support denotes the proportion of the items in the whole dataset that satisfy

the rule; confidence denotes the proportion of the objects that satisfy the consequence among the

objects satisfying the antecedent condition. Frequent if/then patterns satisfying defined minimum

support and minimum confidence are identified as strong association rules.

Apriori is a classic ARM algorithm using particular searching approach and data structure

to efficiently scan large datasets (Agrawal & Srikant, 1994). An itemset implying a rule includes

a set of items. Firstly, a set of candidate itemsets that include only one item in each itemset is

generated, and then the infrequent itemsets that fail to reach the minimum support count are

excluded. By joining the set of candidate itemsets with itself, new itemsets are generated and then

infrequent ones are pruned. The join and prune processes are iterated till candidate itemsets cannot

be extended anymore, and all frequent itemsets are found. Hash table data structure is used to

improve efficiency. Mined frequent itemsets describe the hidden relationships among multiple

attributes in the datasets and can help with prediction and decision-making.

2.2.3 Decision tree induction

Decision tree is a classic data mining method for classification, consisting of a training step and

then a classification step (Han et al., 2001). Firstly during the training process, attribute-value-

known and class-labeled objects are processed to build a decision tree, where each internal node

in the tree, representative as a testing criteria on an attribute value, splits to two or more branches

representing the test outcomes, and the tree is traversed till a leaf node representing one class label

is reached (Han et al., 2001). After the decision tree is constructed, the class labels of new objects

can be predicted using the tree model given their attribute values.

Page 23: Well Production Prediction and Visualization Using Data ...

13

Iterative Dichotomiser 3 (ID3) is one of the most classic decision tree methods. It uses

information gain as the attribute selection measure for selecting the attributes for internal tree

nodes. To build a decision tree, it is essential to select the splitting criteria at each hierarchical tree

level. If one attribute can split the instance space with all the objects into multiple sub-spaces and

each subspace is associated with one class, this attribute makes a pure partition. The conceptually

ideal splitting criteria would generate mostly pure partitions. To measure the goodness of attributes

in terms of their partition purity, one of the most popular attribute selection measures is information

gain. The more information gain one attribute brings, the more purity the partition is.

Compared to neural network algorithms and statistical models, decision trees are more

interpretable because the acquired knowledge for predicting is in a readable form (Lim et al.,

2000). Expert knowledge is easy to be integrated into decision tree building, because the internal

nodes can be viewed and altered. Besides interpretability, once the decision tree is trained, it

facilitates fast and accurate predictions because attributes that contribute little information or cause

overfitting are inherently excluded during the tree construction and pruning processes.

2.3 Oil and gas data visualization

The current commercial oil and gas data management systems focus on data management, but have

limited data analysis functionality. In terms of analyzing the large quantities of data in the oil and

gas industry, visualization tools and other digital techniques have helped with exploring data,

making decisions and improving production (Evans et al., 2002). Visualization methods such as

diagrams, charts, and plots are the most common and straightforward ways to summarize datasets.

Data visualization methods like different plots and charts have been used to study shale gas

production for different shale basins over time (Anderson et al. 2010; Baihly et al. 2010; Nobakht

et al. 2012). Some studies analyzed production trends by integrating transient analysis and plot

Page 24: Well Production Prediction and Visualization Using Data ...

14

analysis – log-log plot and square root time plot (Anderson et al. 2010; Nobakht et al. 2012). In

the study by Baihly et al. (2010), shale wells in the same basin were grouped by the years of their

first production, and line charts were used to plot the average daily gas production of different well

groups; the average production plots were compared over basins. The clear distinction in the

production across different shale basins was attributed to the differences in reservoir properties

and completion processes. Therefore, production analysis using data visualization tools is practical

for characterizing wells and evaluating well performances for a single well or a group of wells.

2.4 GIS applications in oil and gas

One of the most popular commercial products for oil and gas data management in Alberta is

GeoCarta (Divestco, 2016). It is primarily an oil and gas data warehouse integrating locations and

distributions of spatial objects, exploration and production histories and all the other relational data

sources. A mapping interface and a connected data management system work interactively in order

to simplify the workflows of querying and retrieving data. For this specific software, ArcMap is

utilized as the GIS platform, and spatial objects can be either located in the intuitive mapping

interface or searched by the industry standard location descriptions in the data management

system, which is attached to ArcMap as an extension tool. Therefore, ArcGIS Desktop is required

to be installed with GeoCarta; users need to manually update the oil and gas database as GeoCarta

updates it regularly. All the other oil and gas data management software has a similar system

design except for using the other GIS platform instead of ArcMap.

There are also oil and gas information systems built by web GIS technology focusing on

spatial data query (Government of Saskatchewan, 2002), oil and gas industry news notification

(PetroFeed Inc., 2015) and so on, which deliver valuable petroleum related information to users.

Web GIS platforms provide convenient access to rich and real-time datasets through the Internet

Page 25: Well Production Prediction and Visualization Using Data ...

15

using different mobile devices, enabling broad users to share and use the online data resources.

Users can easily search and retrieve data and information through user-friendly mapping

interfaces, without having to be highly trained to fully understand every functions. Moreover, other

web services provide extra geospatial data analytics and visualization functionalities to web GIS

platforms and make web GIS more integrated and powerful.

Page 26: Well Production Prediction and Visualization Using Data ...

16

CHAPTER THREE: A SYMBOLIC TREE MODEL FOR OIL AND GAS PRODUCTION

PREDICTION USING TIME-SERIES PRODUCTION DATA

3.1 Introduction

Comparing early stage production data of a new well with historical production data of adjacent

wells is common practice for future production prediction. Oil and gas production volumes are

essentially time-series data updated at a fixed rate, for instance, hourly, daily or monthly. In Figure

3- 1, the dashed black line represents the monthly production of a newly developed gas well that

has only been producing for three months. To predict its future production, the monthly production

curves of 30 adjacent gas wells from the same geological area are shown in the red solid lines. The

production durations of these wells range from 32–155 months. Obviously, it is very difficult to

compare the production curves by visual inspection alone.

Figure 3- 1 Production data of a new well relative to existing adjacent wells

In this chapter, we propose a symbolic tree model to predict future production performance

of an early stage well using the production histories of the surrounding wells. A novel workflow

Page 27: Well Production Prediction and Visualization Using Data ...

17

is proposed to summarize this historic production data into a hierarchical tree structure. Pre-

pruning mechanisms are integrated, and a new coverage index is designed to achieve a compact

and informative tree. To demonstrate the feasibility of the proposed method we conduct an

experiment on a production dataset from shale gas wells in Montney-A pool in Canada. The

resulting symbolic tree is visually intuitive, and provides accurate predictions of future

performance on holdout wells.

3.2 Symbolic tree construction for well production prediction

To predict future performance of a newly developed well at a specific point in time, the following

two datasets are required:

1) The time-series production history of the new well;

2) Production histories from multiple analogous and adjacent wells.

Figure 3- 2 illustrates the general procedures of building the symbol tree model from time-

series production datasets. The first step is to apply a time-series aggregation method to reduce the

dimensionality of production data, followed by a symbolization method to transform the

numerical, aggregated values to categorical data. User-specified settings for aggregation level and

symbol size are required to perform this conversion. Aggregation level determines to what extent

the original time-series data are compressed, while the symbol size determines the number of

different symbols that will be used to represent the aggregated production data. The result is a data

set containing symbolized sequences of aggregated production data. The second step in this

workflow assists the user in selecting appropriate initial settings by building a series of symbolic

tree candidates with different aggregation levels and symbol numbers. A coverage index is

calculated to select a proper-sized symbolic tree. Finally, the target well performance can be

predicted in the third step, using the selected symbolic tree model.

Page 28: Well Production Prediction and Visualization Using Data ...

18

The following sections provide a detailed description of this workflow.

Figure 3- 2 Flowchart of proposed Symbolic Tree Model

3.2.1 Time-series well production data preprocessing

As mentioned, time-series production data could be voluminous over the years. As an intuitive and

efficient method to reduce time-series data dimensionality, PAA method is extended to transform

the well production data into time-series sequences with fewer data points. First, the aggregation

level needs to be specified by the user. It indicates how many original production data points

should be combined, which decides the length of each aggregated production history. The length

of aggregated production data can be calculated using Equation (1):

𝑤 = {𝑛/𝑚, 𝑛 𝑚𝑜𝑑 𝑚 = 0

⌊𝑛

𝑚⌋ + 1, 𝑛 𝑚𝑜𝑑 𝑚 ≠ 0

(1)

where 𝑛 is the number of the production data points in the original production sequence, and 𝑚 is

the specified aggregation level. If the original data length (𝑛) is divisible by the aggregation level

(𝑚), the aggregated data length is the quotient of 𝑛/𝑚. If not, the aggregated data length is the

Page 29: Well Production Prediction and Visualization Using Data ...

19

floor of 𝑛/𝑚 plus one, where the last one aggregated value represents the remaining production

data.

To downsize each time-series production sequence with 𝑛 production data points to 𝑤 data

points, every data segment of 𝑚 data points is represented by the average value. Equation (2)

shows the calculation for the aggregated production time-series when 𝑛 is divisible by the

aggregation level 𝑚.

𝑝�̅� =1

𝑚∑ 𝑝𝑗

𝑚∗𝑖

𝑗=𝑚∗(𝑖−1)+1

(2)

where 𝑝�̅� represents the 𝑖-th value in the transformed 𝑤-dimensional sequence (�̅� = 𝑝1̅̅̅, … , 𝑝𝑤̅̅̅̅ ),

and 𝑝𝑗 represents the 𝑗-th values in the original production data.

When 𝑛 is not divisible by 𝑤, the first 𝑤-1 data points in the aggregated production

sequence are calculated using Equation (2). The last data point is the average of the left-over

production data.

After reducing the dimensionality, the aggregated time-series production data are still

continuous-valued. So a discretion technique needs to be applied to transform the data into

categorical values. Then the next step is to symbolize each data point in the aggregated time-series

sequence based on the symbol set. The symbol set S can be defined by users, denoted as 𝑆 =

{𝑠1, 𝑠2, … , 𝑠𝑘}, where each element in the symbol set represents a symbol and 𝑘 is the symbol size.

First, the aggregated production data need to be sorted in an ascending order. Then, based on

symbol size, multiple quantiles are calculated along the production data range to divide the

numeric production data values into equal sized groups. Statistically, quantiles are the data points

that divide the range of a probability distribution into intervals with equal probability. If the symbol

Page 30: Well Production Prediction and Visualization Using Data ...

20

size is 3, two quantile values will be calculated to divide all the production values into three groups.

After deciding the equal-probable groups, the corresponding symbols are assigned to the values in

each group.

Figure 3- 3 gives an example how the original monthly production for a well is transformed

with the aggregation level as 3 and symbol size as 4 (the symbol set S= {A, B, C and D}).

Figure 3- 3(a) shows the monthly production data (in solid red line) of a well from January

2010 to September 2012. It is a time-series data with 33 readings. Production data is aggregated

every three-month and shown as the black line. For example, the average production of January

2010—April 2010 is calculated using Equation (2) and the average production is 1821.6. Similarly,

the rest of the aggregated production data values are shown in the graph.

Figure 3- 3(b) displays the distribution of the aggregated well production data of all the

wells used for prediction. Three quantiles, 1500, 2250, and 2950, divide the production data range

of 0 to 4500 into four equi-probable groups, which are respectively assigned four symbols of A,

B, C and D. Based on the symbols with their corresponding data ranges, the production sequence

is transformed into a sequence of chronologically organized symbols, as marked on top of the

black line in Figure 3- 3(a).

Figure 3- 3 Production data aggregation and symbolization (a) Aggregated and symbolized

time-series production data of one example well (b) Data distribution of all aggregated

production data

(a) (b)

Page 31: Well Production Prediction and Visualization Using Data ...

21

3.2.2 Symbolic tree construction and evaluation

To summarize the production data in a predictive model, the symbolic tree is proposed and built

on the symbolized production data of all the wells and their classes. The symbolic tree is a

flowchart-like tree structure, where the hierarchical levels in the tree correspond to chronological

time periods and the tree nodes represent production symbols at particular times.

Each symbolic tree node contains a number of wells with their production symbol

sequences stored and their class labels known. The class labels correspond to certain well

performance criterion. For example, a binary criterion could be whether a well can reach 10,000-

m3 gas production by the end of the first year. If a criterion refers to four different gas production

ranges (i.e. 0- 5,000-m3, 5,000- 10,000-m3, 10,000-15,000-m3, and 15,000- 20,000-m3), four class

labels are created based on the four production ranges. The leaf nodes provide well classes based

on majority voting, while internal symbolic tree nodes provide the probabilities of wells falling in

different classes.

The symbol tree is constructed in a top-down manner. Since the well production symbols

are chronologically ordered, the hierarchical tree nodes in the symbolic tree will follow the same

order in constructing the tree. In an abnormal situation where all the wells belong to the same class,

the symbolic tree will only have a root node that includes all the well data. If the wells belong to

different classes, the nodes in the first level of the tree represent the distinct symbols of the first

chronological time period in all the well production symbol sequences. Each well is then assigned

to its corresponding tree node. Next, the well production sequences in each first-level tree node

are distributed into child nodes in the second tree level, based on the distinct symbols of their

second chronological time period. The nodes in the following levels grow in the same manner.

Page 32: Well Production Prediction and Visualization Using Data ...

22

If wells have been producing for a long time, the symbolic tree constructed could grow

very deep. In addition, the production volumes tend to stabilize after a certain period time, which

does not add more variation in the future production and provides little of additional information

for prediction. A minimum node size threshold is introduced to prevent unnecessary branching

by ensuring that each node contains no fewer than the specified number of wells. Spatial

correlation may also exist in the petrophysical attributes of oil and gas wells from the same

formation (porosity, permeability and oil saturation). This means that adjacent wells, especially

wells produce in the same pool, should have similar production trends. Therefore, the spatial

distribution of wells should contribute to the production prediction. To account for spatial

correlation, a second threshold called minimum spatial information gain, is introduced. Spatial

information gain is a measure of node purity that combines information gain with spatial

correlation. Minimum spatial information gain ensures that a node does not split unless the gain

in purity exceeds the specified threshold.

The basic algorithm of symbolic tree construction is described in Figure 3- 4. The algorithm

takes the well production data, the production symbol sequence length, symbol set, as well as two

other user specified parameters – minimum node size and minimum spatial information gain- as

inputs. The symbolic tree starts with a root node containing the production symbols and well

classes (line 1). If all the wells in the root node belong to the same class, the node is marked as a

leaf node with that class (line 2). Otherwise, the root node with a tree level of 0 is passed to the

function called Generate_tree_node (line 4). The defined function takes a current node and a tree

level as arguments, and it is a recursive function. From the current node, a child node 𝑁𝑖 is

generated at the next tree level corresponding to each symbol in the symbol set, and retrieves and

stores the wells that have the symbol of si at time period of tree_level+1 (line 8-10). If the tree

Page 33: Well Production Prediction and Visualization Using Data ...

23

level does not reach the length of the production sequence, node size and spatial information gain

are calculated for the node 𝑁𝑖 (line 12). The node will be truncated if its size is zero, marked as a

leaf node if it does not reach either the minimum node size or the minimum spatial information

gain, or passed to the Generate_tree_node function with an incremented tree level (line 13-15). If

the tree level equals the length of the production sequence, the node will be marked a leaf node,

and the class is calculated on majority voting. After the entire symbol tree levels in the symbolic

tree are constructed, the resulting symbolic tree is returned.

Figure 3- 4 Basic algorithm for constructing a symbolic tree from well production data

Page 34: Well Production Prediction and Visualization Using Data ...

24

The node size corresponds to the number of wells included in one node. The calculation is

straightforward. The spatial information gain calculation will be given in the next section.

3.2.2.1 Spatial information gain calculation for symbolic tree nodes

Spatial correlations are commonly observed in geological properties and production performance

among wells, especially wells producing from the same reservoir. In the other words, if two wells

are closer to each other, the higher probability the production performance of the wells are similar.

Spatial entropy is an information measure integrated the influence of spatial distribution on the

non-spatial attributes of spatial objects. The spatial entropy calculation presented by Claramunt

(2005) is employed in this study because it is intuitive and easy to calculate.

First of all, given a well production dataset 𝐷 = {𝐷1, 𝐷2, … , 𝐷𝑝}, where wells are classified

into p different classes, two average distance measures – intra-distance and extra-distance – are

defined. The intra-distance, denoted by 𝑑𝑖𝑖𝑛𝑡, is the average distance between all wells in the class

of 𝐷𝑖. The extra-distance, denoted by 𝑑𝑖𝑒𝑥𝑡, is the average distance of wells in 𝐷𝑖 to all other wells

in other classes of 𝐷. In Equation (3), when 𝐷𝑖 is empty or only contains one well, the intra-distance

is assigned a constant value α to avoid the interference of null values during the computation. In

Equation (4), when 𝐷𝑖 includes all the wells in 𝐷, that is 𝐷𝑖 is the only class, the extra-distance is

assigned a large constant 𝛽. 𝑑𝑖𝑠𝑡(𝑎, 𝑏) is the distance between wells 𝑎 and 𝑏.

𝑑𝑖𝑖𝑛𝑡 = {

1

|𝐷𝑖|×|𝐷𝑖−1|∑ ∑ 𝑑𝑖𝑠𝑡(𝑎, 𝑏)

|𝐷𝑖|𝑏=1,𝑏≠𝑎,𝑏∈𝐷𝑖

|𝐷𝑖|𝑎=1,𝑎∈𝐷𝑖

, |𝐷𝑖| > 1

𝛼, |𝐷𝑖| ≤ 1 (3)

𝑑𝑖𝑒𝑥𝑡 = {

1

|𝐷𝑖|×|𝐷−𝐷𝑖|∑ ∑ 𝑑𝑖𝑠𝑡(𝑎, 𝑏)

|𝐷−𝐷𝑖|𝑏=1,𝑏≠𝑎,𝑏∉𝐷𝑖

|𝐷𝑖|𝑎=1,𝑎∈𝐷𝑖

, 𝐷 ≠ 𝐷𝑖

𝛽, 𝐷 = 𝐷𝑖

(4)

Page 35: Well Production Prediction and Visualization Using Data ...

25

Definition 1. The spatial entropy of dataset 𝐷 based on its partition {𝐷1, 𝐷2, … , 𝐷𝑝} is defined as

(from Claramunt (2005)):

𝐻(𝐷) = − ∑𝑑𝑖

𝑖𝑛𝑡

𝑑𝑖𝑒𝑥𝑡 𝑝(𝐷𝑖)

𝑝

𝑖=1

𝑙𝑜𝑔2𝑝(𝐷𝑖) (5)

In this definition, 𝐻(𝐷) is the spatial entropy, which is calculated by including a weight

factor 𝑑𝑖𝑖𝑛𝑡/𝑑𝑖

𝑒𝑥𝑡 in the Shannon entropy formula. The weight factor decreases when either the

intra-distance decreases or the extra-distance increases, so that the spatial entropy measure takes

into account the spatial distribution of the objects, which in this study are oil and gas wells.

Spatial Entropy is a special form of Shannon entropy. It has been known that Shannon entropy of

an even distribution of objects reaches the maximum value and tends to decrease as the

concentration of the distribution increases. Therefore, spatial entropy is a monotonic decreasing

function for local non-spatial attribute similarity and spatial correlation.

Spatial information gain calculates the difference between the spatial entropy values

between the parent node and a child node. It uses as one of the criteria to decide whether a

symbolic tree node should grow child tree nodes at the next tree level. This measure stands for the

reduction of spatial entropy by further growing tree nodes using production symbols at the

sequential time period. To determine the spatial information gain from splitting the current node

𝑁, the spatial entropy values are calculated respectively before and after potentially generating

child nodes from the current node.

Before generating child nodes, spatial entropy of 𝑁 is the average amount of information

needed to identify the class label of a well in 𝑁. Equation (5) is used to calculate the spatial entropy

before generating child nodes from node 𝑁. The well dataset within node 𝑁 is denoted as 𝐷, and

Page 36: Well Production Prediction and Visualization Using Data ...

26

there are 𝑝 different classes. The log function in the formula is taken as logarithm to the base 2,

because information is encoded in bits. After generating child nodes from node 𝑁, the well

production symbols at the sequential time interval are provided. Equation (6) calculates the amount

of spatial entropy provided by the well symbol information and well class data. 𝑚 is the symbol

size. At node 𝑁, for all the wells that have one particular production symbol are denoted by 𝐷𝑗 ,

and the spatial entropy is denoted by 𝐻(𝐷𝑗). 𝐷𝑗

𝐷 represents the percentage of wells in 𝐷𝑗 out of all

wells in 𝐷. The total spatial entropy is measure by adding up 𝐷𝑗

𝐷× 𝐻(𝐷𝑗) for all production

symbols.

𝐻𝑁(𝐷) = ∑𝐷𝑗

𝐷

𝑚

𝑗=1

× 𝐻(𝐷𝑗) (6)

The spatial information gain is defined by subtracting 𝐻𝑁(𝐷) from 𝐻(𝐷). It represents

how much spatial entropy is reduced by growing child nodes to the next tree level from this node.

Therefore, if the spatial information gain is less than the minimum spatial information gain

threshold, the node is restrained from generating child nodes.

3.2.2.2 Coverage index for evaluating symbolic trees

For time-series well production data preprocessed using different aggregation level and symbol

size settings, indicated by users, the resulting symbolic trees have different structures. From the

point of a decent symbolic tree, tree size is one of the most important criteria. Generally, a compact

tree would be more favorable than a larger, more complex tree. Theoretically, complexity reduces

the predictive ability of a tree on independent datasets, even though the tree explains the training

data well (Mingers, 1989). A large tree tends to be overfitting. Realistically, it would be difficult

for engineers to visually interpret a complex symbolic tree and derive information about well

Page 37: Well Production Prediction and Visualization Using Data ...

27

production from the tree structure. Therefore, the size of the tree should be minimized to provide

a compact symbolic tree. Furthermore, a compact tree is easier for users to interpret and analyze

the production histories of all the wells.

In this study, a size-based index 𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 is proposed to evaluate the symbolic tree

derived from symbolic time-series production data. As Equation (7) shows, 𝑥𝑖 stands for the

number of branches at each level from the tree, 𝑠 stands for the symbol size used for representing

production data, and 𝑛 is the tree depth.

𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 =∑ 𝑥𝑖

𝑛𝑖=1

∑ 𝑠𝑗𝑛𝑗=1

(7)

The numerator of the index represents the total number of occupied branches in the

symbolic tree, while the denominator corresponds to the total number of possible branches in a

fully-grown tree with a symbol size of 𝑠. The smaller the coverage index value, the more compact

the symbolic tree is. Users can calculate coverage index values for multiple symbolic trees built

from production data sequences created using different combinations of aggregation level and

symbol size; and then select the tree with the lowest coverage index.

3.2.3 New well production prediction

The symbolic tree model is constructed for the purpose of predicting new well performance. For a

new well with limited production history, the aggregation and symbolization processes first

transform its time-series production data points into a production symbol sequence. Then, the

production symbol sequence of the new well is matched onto the symbolic tree. If this sequence

ends at a leaf node, the new well is labeled as the majority class in the leaf node. If it ends at a

non-leaf node, it can imply the possibility of the well belonging to different class labels based on

the well class information embedded in the node.

Page 38: Well Production Prediction and Visualization Using Data ...

28

The following section applies the proposed symbolic tree method to a real dataset of shale

gas wells to predict shale gas production after 12 months.

3.2.4 Symbolic Tree Visualization

A visualization template is designed for users to investigate the symbolic tree model in an

interactive and intuitive manner. Different colors are assigned for branches and nodes for better

visualization. In order to present the pruned trees with comparison to fully expanded trees, the

branches of the pruned trees are colored in red, while the uncovered branches in full trees are

colored in grey. The thickness of the branches indicates the number of wells included in the end

nodes of the branches. A thicker branch means more wells, while a thinner branch represents a

smaller number of wells. Leaf nodes are colored differently due to the different classes they

represent. An example will be shown in the case study section.

The following section applies the proposed symbolic tree method to a real dataset of shale

gas wells to predict shale gas production after 12 months.

3.3 Case study: Canadian shale gas production prediction

The majority of the shale gas resources in Canada reside within the Western Canada Sedimentary

Basin (WCSB), which extends across British Columbia, Alberta, Saskatchewan and Manitoba.

Shale gas drilling and production are mainly occurring in British Columbia and Alberta at present.

The proposed symbolic tree model can assist engineers with decision-makings regarding the

following three questions.

1) Can we predict if a well will reach certain production at a benchmark time?

2) How early can we accurately predict new well prediction?

3) Where are the analogous wells with a similar production trend located?

Page 39: Well Production Prediction and Visualization Using Data ...

29

Montney play that is located in British Columbia and Alberta is considered as the most

active drilling area in Canada. 972 wells in Montney–A pool that have been producing for more

than one year were selected for this case study. The proposed symbolic tree model can help

engineers predict at a production early stage if a shale well will reach 10,000 m3 gas production

cumulatively within the first 12 months. The wells were classified into one of two groups, wells

with more than 10,000 m3 within one year (Class: Y) and wells that did not reach that much

production (Class: N). Also, the wells sharing similar production trends can be identified on a map.

In the experiment, the whole well dataset is first divided into a training dataset and a testing

dataset. The well data were collected from the three data sources: (1) British Columbia (BC) Oil

and Gas Commission for BC shale gas well identifiers and locations; (2) Alberta Energy Regulator

for Alberta shale gas wells; (3) Divestco GeoCarta (a commercial software) for production data.

The training dataset contains 80% wells randomly chosen from three individual fields, Dahl Field,

Heritage Field and Northern Montney Field in Montney-A pool. The testing dataset, on the other

hand, contains all the remaining 20% wells. Both training and testing datasets include wells from

three fields.

First, the production data of all wells are symbolized. The production data are aggregated

monthly, and then four different symbol sizes (i.e. 3, 4, 5, and 6) with equiprobable data discretion

are applied to symbolize the production data. For example, for symbol size of 3, two 3-quantile

data points (i.e. 336.55 and 1127.4) were calculated from all production values in the training

production dataset so that the whole dataset could be cut into three equal-sized groups. The

symbols corresponding to the three data ranges are listed in Table 3- 1. Based on the calculated

data ranges for the three different symbols, the 12-month production data sequences are

Page 40: Well Production Prediction and Visualization Using Data ...

30

represented by sequences of 12 symbols. Table 3- 2 and Figure 3- 5 show the first 12-month

production of two wells and the corresponding symbols.

Figure 3- 5 Monthly well production and symbol sequences for two example wells

Table 3- 1 Symbols and ranges for time series symbolization of 3 symbols

A B C

0 – 336.55 336.55 – 1127.4 >1127.4

Table 3- 2 First 12 month monthly well production and symbols for two example wells

UWI: 00/B-014-J/094-B-16/0

Accumulative gas: 14493 m3 Class: Y

2056.8 2084.9 1926.8 1540.3 1266.3 1113.4

C C C C C B

919 881.6 790.4 708.2 599.3 606

B B B B B B

UWI: 00/05-05-078-17W6/0

Accumulative gas: 7793.9 m3 Class: N

854.9 683.6 38.9 906.9 1003.7 818.2

B B A B B B

744 609.8 552.3 558.6 497 526

B B B B B B

Page 41: Well Production Prediction and Visualization Using Data ...

31

Figure 3- 6 shows the symbolic tree generated by using a symbol size of 3 and pre-pruning

thresholds of minimum 10 wells per node and minimum spatial information gain of 0.1. In the

visualized tree model, as shown in Figure 3- 6, the leaf nodes are colored either in blue or orange.

A blue node represents that the wells in this leaf node are going to reach the expected production

by the end of the first year, while an orange node represents that the wells will fail to reach the

expected production amount.

As shown in the figure, the depth of the symbolic tree is 5. It implies that whether the well

can achieve more than 10,000 m3 gas production cumulatively within 12 months can be estimated

by referring to the well production within the first five months. On the other hand, at each

hierarchical level of the symbolic tree, the branches from left to right correspond to gas production

from low to high. The rightmost branches at the first tree level in the symbolic trees lead to leaf

nodes representing wells that reached the expected performance. Therefore, it implies that the gas

wells that had relatively high well production (>1127.4 m3/month) on the first month are highly

likely to have good production performances and thus reach the expected production amount by

the end of the first year.

Page 42: Well Production Prediction and Visualization Using Data ...

32

Figure 3- 6 Symbolic tree built from symbolic gas production time series using symbol size

of 3

Figure 3- 7 shows the example of the general process of production data preprocessing and

symbolic tree construction. The production histories of the 777 wells distributed in Montney-A

pool were retrieved and plotted in the line chart at the upper right. It is difficult to visually interpret

whether there are any trends in the production from the line chart, as there are too many lines and

the lines are overlapping. Then, the production data were aggregated monthly and symbolized with

a symbol size of 3, as shown in the table. A symbolic tree was constructed on the production

symbol sequences. Taking the tree nodes representing a production sequence of “B C C” for

example, this trace ends at the third tree level and stops at a blue leaf node, which means a new

well with “B C C” as the production symbols for the first three months is predicted to be able to

achieve more than 10,000 m3 cumulative gas production within 12 months. In the bottom-right

line chart, the red lines represent the production histories of the 116 wells falling in the black tree

Page 43: Well Production Prediction and Visualization Using Data ...

33

branches, and the blue markers in the map show the spatial distribution of these wells. There are

only 2 wells out of the 116 ones did not reach 10,000 cumulative gas production within 12 months,

so this leaf node represents a class of ‘Y’. The dashed black line represent the well which was

predicted to be able to reach the production threshold because its first 3-month production symbols

match “B C C”.

Figure 3- 7 Production data preprocessing, symbolic tree construction and well production

prediction

Coverage index can help users to determine the parameter settings for the symbol size. In

this case study, coverage indexes were calculated for the symbolic trees constructed from

production data symbolized with different symbol sizes. We continued constructing symbol tress

with symbol sizes as 4, 5, and 6, as well accuracy assessments were conducted. The coverage

values for the symbol sizes of 3, 4, 5 and 6 are respectively 17.9%, 20.9%, 13.5%, and 8.2%, as

listed in Table 3- 3. This means that by trimming the original symbolic trees using minimum node

size and minimum spatial information gain the tree sizes are significantly reduced.

Page 44: Well Production Prediction and Visualization Using Data ...

34

The trees could expend to 12 hierarchical levels because the dataset includes well

production of 12 months. However, by pre-pruning the symbolic trees with minimum node size of

10 and minimum spatial information gain of 0.1, the depth of tree is significantly reduced to 4 or

5.

The testing dataset was used to assess the symbolic trees built for predicting whether the

wells could reach an expected production at the benchmark time. Three measurements were

calculated for evaluating the symbolic trees: sensitivity, specificity and accuracy. Wells that

reached 10,000 m3 gas production cumulatively within 12 months are referred as having good

performances, and wells that failed this expectation are with poor performances. Sensitivity

measures the true positive rate, which is the proportion of the wells with good performances that

are correctly identified as having reached the production expectation. Specificity measures the true

negative rate, which is the proportion of the wells with poor performances that are correctly

identified as having not reached the expected production. Accuracy measures the proportions of

the correctly classified wells, with either good or poor performances, out of all wells.

As shown in Table 3- 3, the four symbolic tree models have reached accuracy above 0.9,

which means over 90% of the wells have been correctly classified in terms of their ability to reach

10,000 m3 gas production by the end of first year. The sensitivity values are all higher than the

specificity. Therefore, the four models are more accurate in predicting well performed wells than

poorly performed wells. Overall, the symbolic tree built from the production data symbolized into

3 levels has the best accuracy.

Page 45: Well Production Prediction and Visualization Using Data ...

35

Table 3- 3 Sensitivity, specificity and coverage calculation for the four symbolic tree

predictive models using different symbol sizes for time-series data symbolization

Symbol size Coverage Sensitivity Specificity Accuracy

3 0.1791 0.9688 0.8806 0.9325

4 0.2088 0.9583 0.8657 0.9202

5 0.1346 0.9479 0.8806 0.9202

6 0.0824 0.9375 0.8656 0.9080

Page 46: Well Production Prediction and Visualization Using Data ...

36

CHAPTER FOUR: DATA VISUALIZATION TOOL DESIGNS FOR SPATIAL WELLS

AND TIME-SERIES OIL AND GAS DATA

4.1 Introduction

Oil and gas data is a strategic asset in the industry. Visualization tools assist petroleum

professionals with data comprehension, information seeking and decision-making. However,

considering the variety and volume of oil and gas data, the information that simple graphics can

deliver is limited.

Time-series data and spatial data are two essential data types in the oil and gas dataset. A

system with interactive visualization tools integrated in a Web GIS platform is designed to flexibly

deliver maximum information in single visual representations. This sections below introduce the

data characteristics of different time-series data, the visualization tasks, the Web GIS interface

design and the visual tool designs.

4.2 Oil and gas time-series data characteristics and visualization tasks

4.2.1 Oil and gas time-series data characteristics

To start a visualization process, according to Aigner et al. (2011), the data and the task at hand are

the two aspects above all else that should be taken account for. In terms of the oil and gas data that

have been retrieved from the public domain, associated with each well, there are textual and

numerical data on well identification, name and affiliations, spatial data indicating the surface and

bottom well locations, and time-series data about well status, well completion, injection and

production. Time-series data make up an important part of oil and gas data, and the updates on

well injection and production are what engineers refer to for decision-makings during production

process.

Page 47: Well Production Prediction and Visualization Using Data ...

37

The data types of the non-time-series data are straightforward, while the time-series data

have varied characteristics for different themes. To characterize the time-series data, four criteria

are used, where three criteria (i.e. scale of variables, dimensionality, and frame of reference) are

extracted from fundamental design alternatives for time-oriented data provided by Aigner et al.

(2011). The definitions of the different data characteristics under the oil and gas time-series data

context are given below.

Scale of variables (quantitative vs. qualitative): quantitative variables have numeric data

values, while qualitative variables have numerical or ordinal data. Oil and gas time-series data

that are in number form are quantitative, while those with categorical data are qualitative.

Number of variables (univariate vs. multivariate): a univariate time-series records the changes

of one oil and gas related variable over time, while a multivariate time-series consists of

multiple synchronously recorded streams.

Frequency of update (uniformly-sampled vs. irregularly-sampled): uniformly-sampled time-

series is regularly updated, for instances, every second, daily, monthly, or annually, while some

time-series updates occasionally. Irregularly-sampled data can be seen as events or changes of

states.

Frame of reference (abstract vs. spatial): by spatial data we mean the events or states taking

place in different spatial locations along time while abstract data do not include the spatial

aspect. In this case, oil and gas time-series data are associated with individual wells. Data that

indicate events or states associated with different locations along one wellbore are spatial,

while data that generally belong to one well are defined as abstract time-series.

Page 48: Well Production Prediction and Visualization Using Data ...

38

Table 4- 1 shows the characteristics of four types of oil and gas time-series data. Well status

data include the statuses (e.g. well authorization granted, cased, active production, and suspended)

one well has been undergoing, and each status state is related to time periods of different lengths.

Well status data indicate changes of well state. Well completion data include the completions (e.g.

open hole, fracture, and perforate) that take place at different depth intervals along the wellbore.

The time aspect of completion data includes the date when one completion event happens. Well

status and well completion are both qualitative, univariate and irregularly-sample. However, the

well completion data include the spatial perspective because well completions happen in different

locations along the well trajectory. Well injection and well production data are both quantitative,

multivariate, uniformly-sampled and abstract, because for each well multiple instances (e.g. water,

steam, and gas) are injected and produced and the volumes of the injection and production are

updated regularly.

Table 4- 1 Data characteristics of four types of oil and gas time-series data

Scale of

variables

Number of

variables

Frequency of

update

Frame of

reference

Well status qualitative univariate irregularly-sampled abstract

Well completion qualitative univariate irregularly-sampled spatial

Well injection quantitative multivariate uniformly-sampled abstract

Well production quantitative multivariate uniformly-sampled abstract

4.2.2 Oil and gas data visualization tasks

With the data characteristics clarified for different oil and gas time-series data, the next step is to

list the visualization tasks. There are three basic purposes that users usually have through data

visualization: exploration, information/knowledge confirmation, and analyzed result presentation

(Ward et al., 2010).

Page 49: Well Production Prediction and Visualization Using Data ...

39

Based on the discussion with some petroleum engineers, some points were made on what

aspects in the time-series data the engineers are hoping to explore. As for the well status, engineers

want to look into the general lifecycle and the durations of all the statuses each well has been

undergoing. In terms of the completion data, engineers to want to examine the completion

processes of each well by looking at the completion types and the corresponding temporal and

spatial information. The injection or production history which involves multiple variables (i.e.

water, oil, gas, other fluids, and hours) is to be explored for injection or production trends and

abruptions. Also, the distribution of oil, gas and water production at different time stamps is of

interest to engineers.

In terms of confirmative analysis, hypotheses are to be proved. The completion operations

during hydraulic fracturing taking place adjacent to one well will impact production of the well is

the hypothesis to be proved. Hydraulic fracturing is an oil/gas recovery technique that has enabled

the large-scale commercial production of shale gas and oil, by hydraulically pressurizing fluid to

fracture rocks and release natural gas and oil to the wellbores.

4.3 Web GIS interface and visualization and interaction controls

After studying the data types, data characteristics and the visualization tasks for the available oil

and gas data, user interfaces built with visualization and interaction controls are to be designed to

achieve maximum information delivered. The Visual Information-Seeking Mantra proposed by

Shneiderman (1996) has been a well acknowledged starting point or the general guideline for user

interface designs. It points out that the user experiences with an interface always follows a process

– “Overview first, zoom and filter, then details-on-demand”. This process is iterated as users

explore into more detailed information. Therefore, the user interface designs aim to provide an

Page 50: Well Production Prediction and Visualization Using Data ...

40

overview of the data objects on the first page load, interaction controls to accentuate and filter

data, and further visual presentations to demonstrate details.

4.3.1 Web GIS interface

Oil and gas wells are real-world objects and static spatial objects. GIS technology is employed to

provide users the overview of the oil and gas well locations/distributions and basic well

information. The initial main interface displays wells in red Google Maps markers on a map, which

occupies the whole page, to give users an overview of the well distribution, as shown in Figure 4-

1. Additionally, a button at the bottom of the map opens an attribute table listing the non-time-

series properties of the wells.

The basic information of each well is shown in the information window by single clicking

on the well. By double click on the well, it is selected and highlighted in blue. Besides the general

well information shown in the information window, two buttons will open visualization panels

respectively for production and completion data of the selected well. Moreover, holding the shift

key and pressing the left mouse button can draw a polygon box to select and highlight multiple

wells on the map. Clicking the button in the middle bottom will show the attribute table of the

highlighted wells. All the highlighted wells are correspondingly selected in the table. Table

manipulations like selecting/unselecting, sorting, filtering are enabled.

Page 51: Well Production Prediction and Visualization Using Data ...

41

Figure 4- 1 Mapping interface with extended attribute table and information window

4.3.2 Injection and production data visualization template

At an overview level, a multi-series line chart is used to demonstrate the trends of the volumes of

different substance production/injection substances and production/injection hours. The horizontal

axis represents the time from the beginning of production/injection to the time of the last data

record linearly. Additionally, for users to explore and target specific data to investigate, one

interaction tool is added to reduce the dimensions of displayed production/injection data. A

toggable legend is employed so that users can filter out the items in the legend and the

corresponding line will disappear or reappear in the multi-series line chart. In Figure 4- 2, only

gas, hours and water legends are toggled on, so the line chart only displays the lines of the three

items. Moreover, users can zoom in to smaller time ranges for details by indicating a time range

in the time range slider below the main line chart canvas. As shown in Figure 4- 2, the dark grey

area in the time range slider shows the chosen time range from 2008 to 2012.

Open attribute table button

Table manipulation buttons

Information window

Page 52: Well Production Prediction and Visualization Using Data ...

42

As analyzed in the user task, the distribution of oil, gas and water at a certain time would

be of interest to engineers. To model a chosen month along the continuous time, two interaction

controls, time slider and indicator, are employed. A time slider on the line chart canvas functions

as the link between a certain month and the whole timeline. Since the time slider moves

continuously, the time indicator is used to indicate the exact month where the time slider falls in.

According to the indicated month, a pie chart is updated to display the gas, oil and water

distributions at that time. Figure 4- 2 shows the layout of the two charts. As the mouse moves over

the line chart, the time slider slides, the pie chart updates, the time indicated at the top right corner

of the line chart canvas (June 2008 in this case), and the specific numbers update beside the legend

items. The detailed data are dynamically updated with movements of the time slider.

Figure 4- 2 Interactive chart for viewing production data

Time indicator Time slider

Legend

Time range slider

Page 53: Well Production Prediction and Visualization Using Data ...

43

4.3.3 Bubble map for visualizing production data of multiple wells

If the scale of wells that users look into is a group of wells instead of a single one, another

visualization tool named bubble map can be used to display the productions. Bubble map is a

design of thematic map that focuses on displaying the distributions of gas, oil and water production

of different wells within a specific geographic area at a specific time. The distributions of gas, oil

and water production can be compared over different wells at the same time. Also, for one or

multiple wells, the changes of distributions can be viewed by comparing different bubble maps

created for different times.

As shown in Figure 4- 3, the two-tier visualization panel is comprised of a map at the top

and a multi-line chart at bottom. At the overview level, pie charts are mapped to show the spatial

distribution of the wells, and the line chart shows the production histories on gas, oil and water.

Each pie chart is centered at the coordinate of the shale well, showing the distribution of gas, oil

and water productions of the well. The multi-series lines display the total productions of gas, oil

and water by all the wells shown in the map above. Users can easily observe the wells with high

percentages of water production due to the distinction in colors representing different substances.

In terms of zooming into certain time stamp and updating the map, the time slider can be moved

to update the pie charts so that users can detect the changes in the gas, oil and water distribution

of wells of interest. In order to show more detailed information on demand, hovering on one single

well the user is interested in will make the rest wells transparent, and clicking on the well can lead

to some specific information, as shown in Figure 4- 4.

Page 54: Well Production Prediction and Visualization Using Data ...

44

Figure 4- 3 Bubble map for shale gas wells

Figure 4- 4 Bubble map for shale gas wells with detailed information expended on one

specific well

Time slider

Page 55: Well Production Prediction and Visualization Using Data ...

45

4.3.4 Completion data visualization template

As for completion data, which are irregularly sampled to indicate completion events happening

from the start of exploiting and operating a well. A continuous-scaled linear time model is used to

represent the timeline of a well, and multiple dates along the time correspond to the completion

dates. In terms of the completion data, they are categorical and associated with one-dimensional

spatial information – depth. Therefore, a horizontal axis is used to represent the time, a vertical

axis represents depth and different colored bars display the completion types. A bar whose length

is mapped to the vertical axis represents the completion that takes place within a certain depth

range. The values in the y-axis increase downwards as the sub-surface depth increases.

As shown in Figure 4- 5, the depth and time about where and when completions like

perforation, fracturing and open holes that occur within one kilometer to a target shale gas well

are demonstrated in the bar chart. To filter out certain completion types, the corresponding legend

items are to be toggled so that users can look into only completions of interest. Moreover, by

clicking on the button in the legend for the production line, the production history will be added

to the bar chart. With the production history and completion activities shown in the same plot, the

changes of the production can be compared with completion data. Same as the production data

visualization panel, the time range slider zooms the main bar chart into the time period so that

users can obtain detailed information.

Additionally, detailed information associate with a single completion include the date,

completion type, length and the formation(s) where the completion took place. In order to reach

the details, users can place the mouse over a specific bar in the bar chart.

Page 56: Well Production Prediction and Visualization Using Data ...

46

Figure 4- 5 Bar chart for viewing shale gas completion data

4.3.5 Well status data visualization template

As for well status, which is a univariate time-series attribute, a simple timeline chart is used to

represent the status history of one well, with differently colored bars along the timeline

representing the past and current statuses. There is a legend on the left side of the status data

visualization panel showing all possible statuses. When the mouse is moved onto certain status

along the timeline, the corresponding status legend will be highlighted with other legends faded,

and the duration will show at the left side of the panel. By working with the interactive timeline,

the user can not only know the chronological processes the well has been undergoing but also

detailed periods of time according to each operation. As shown in Figure 4- 6, the light blue legend

for Drilled and cased is highlighted, and it shows this well was under this status from May 27 2007

to May 16 2011.

Show production line

Wells within 1

kilometers of the

Page 57: Well Production Prediction and Visualization Using Data ...

47

Figure 4- 6 Timeline chart for viewing well status data

Page 58: Well Production Prediction and Visualization Using Data ...

48

CHAPTER FIVE: A WEB-BASED STEAM ASSISTED GRAVITY DRAINAGE DATA

VISUALIZATION AND ANALYTICAL SYSTEM

5.1 Introduction

In this chapter, a data visualization and analysis system built with Web GIS technology is proposed

and implemented with data from SAGD projects.

In the late 1970s, steam-based in-situ process, SAGD, was developed and introduced as an

oil recovery technology for abundant Canadian heavy oil and bitumen (oil sands) (Butler, 2013).

SAGD employs a horizontal well pair configuration with an upper injection well and a lower

production well drilled in parallel. High-temperature steam is injected through an injector to heat

up the reservoir and form a chamber, and then the heated oil bitumen at the chamber edge will

drain down and flow through the producer (Alberta Energy Regulator, n.d.). Now SAGD is being

widely used as a thermal production technology to extract oil bitumen from Alberta’s subsurface

oil sands deposits. Projects using SAGD technology are being more common: the number of

commercial SAGD projects in Alberta has reached 16 by 2013, compared to less than 5 before

year 2000 (Alberta Energy, n.d.).

As the expansion of SAGD projects, huge and ever-growing quantities of SAGD-related

data have been accumulated, involving various domains oil and gas industry could interfere with

- generally like geophysics, geology, petroleum, business and administration. Applications

assisting in storing and managing the voluminous and complex SAGD datasets are in demand.

Targeted at SAGD data, a data application should be able to accommodate the spatial

characteristics of SAGD wells, provide users with access to integrative SAGD-related data and

append spatial exploration and analysis functionalities.

Page 59: Well Production Prediction and Visualization Using Data ...

49

The contributions of this application are as follows. Firstly, by integrating GIS, a GIS

mapping interface and the database management system can work interactively as users explore

the SAGD spatial and attributive data. Different spatial layers and flexible spatial queries can help

users efficiently target spatial SAGD wells and then apply the visualization and analysis functions

to the wells. Secondly, the web GIS platform is approachable by broad audience. Users can access

the GIS system and make use of the mapping and analytic functionalities through web browsers.

Thirdly, public and proprietary SAGD data are collected, and archived in a specially designed

database. Intuitive and interactive data visualization methods like attribute table, histograms and

time-series data viewer, as well as data mining techniques like clustering and ARM are

implemented in the system for users to further comprehend SAGD data and make decisions.

The remainder of the chapter is organized as follows. The second section introduces the

web-based system structure and the database design. The third section presents the Web GIS user

interface, while the fourth section focuses on the data visualization and data mining functionalities.

5.2 Design of the SAGD data visualization and analysis system

A system with an integrated modular base needs to preserve a GIS environment focusing on SAGD

wells and adopts new designs and implementations to perform the following functions: (a) provide

a Web GIS platform, making the system accessible to users through web browsers; (b) render

archived SAGD data searchable by locations or attributes, and searched results exportable; (c)

visualize attributive and time-series data in forms of tables, interactive charts and graphs; (d) apply

clustering and ARM techniques and visualize mined spatial patterns in the interface. This section

introduces the system design and presents the employed technologies and the SAGD database.

Page 60: Well Production Prediction and Visualization Using Data ...

50

5.2.1 System design

The system design is illustrated in Figure 5- 1. The web-based application consists of four main

components: the SAGD database, the data processing server, the web server and the user interface.

The four components communicate, and deliver and present users information according to their

requests.

Figure 5- 1 System design of the proposed web-based GIS for a SAGD dataset

This system is built upon HTML5 and CSS3, which respectively structure and style

webpages, and can flexibly modify and adjust webpage elements. JavaScript as an object-oriented

programming language is used in developing the system, since it can be executed on the client side

to avoid excessive communication with the web server and reduce the processing time. Moreover,

there is a rich amount of third party JavaScript libraries, plugins and modules that can be used to

accelerate the development. The other important technologies and open source libraries that have

been used in this system include PostgreSQL, the Google Maps API and Node.js.

Clustering library

Page 61: Well Production Prediction and Visualization Using Data ...

51

The web server was developed using Node.js. Node.js is a JavaScript runtime that can run

in different operational systems and optimizes the scalability of the input/output processes (Node.js

Foundation, 2016). PostgreSQL is an open-source object-relational database that can run on all

major operational systems (PostgreSQL Global Development Group, 2016). It is a well-

maintained, powerful and reliable open source database. Google Maps API provides the web

mapping service, which include mapping spatial wells, spatial selection as well as some map

controls, such as zoom controls, view controls and so on (Google, 2016).

5.2.2 SAGD database

5.2.2.1 Database structure

Besides the huge quantity of data in an on-progress SAGD project, the data types can be varied,

as there are static and dynamic data, numerical and categorical data, and first-hand and derived

data. The dynamic injection and production processes in SAGD operations generate time-series

data on the injected and produced instances and the amounts. As for the geospatial characteristics

of SAGD wells, multiple SAGD well pairs (injector and producer pairs) are drilled in units of well

pads; the surface locations of the horizontal wells (heels) are aggregated in the pad centers, while

horizontal wells spread out underground and reach the end locations (toes). A relational database

is designed and deployed with integration of basic well information, geographic coordinates, well

status, and injection and production records. Figure 5- 2 shows an overview of the database

structure.

Page 62: Well Production Prediction and Visualization Using Data ...

52

Figure 5- 2 SAGD database structure

The Unique Well Identifier (UWI) is a standard well identification containing 16 characters

in four components sequentially (Cenovus, n.d.). The primary purpose of UWI is to differentiate

every single well. Therefore, in the database, a primary key is designed to correspond to UWI in

the root table and also associate to primary keys in other tables.

The well status, injection and production histories of the SAGD wells are stored in three

separate tables. The well status indicates the general phases with the corresponding periods of time

that an individual well has undergone, e.g. observation, drilled and cased, abandoned. The injection

Page 63: Well Production Prediction and Visualization Using Data ...

53

or production table stores the well monthly records. For a producer, there might be some injection

during the start-up for warming-up purpose; there might be small amounts of produced substances

from an injector as well.

A key SAGD performance measurement, Steam Oil Ratio (SOR) calculates the amount of

steam used for producing a barrel of oil (Cenovus, n.d.). Small SOR rates like around 2 represent

efficient SAGD operations. Cumulative Steam Oil Ratio (CSOR) is the accumulative amount of

steam divided by accumulative produced oil, which measures the efficiency since the well pair has

been operated. In the database management system, SOR and CSOR are calculated for each well

pair. Statistical measurements like minimum, maximum, average and standard deviation are

calculated for SOR and CSOR as well as injected steam, operation hours and produced oil amount,

and stored in the statistics table.

5.2.2.2 Data collection and preprocessing

Considering the quantity of data that SAGD projects could have been generating, public available

data are limited and scattered. Some SAGD in-situ and surface facilities collect real-time data,

which are compiled and only distributed within the organizations. Through Alberta Energy

Regulator, annual reports on in situ performance of each SAGD projects are accessible. In the

annual reports, summary information on geology and geophysics, drilling and well

instrumentation, seismic, and operation performance are displayed in forms of maps and graphs.

On the other hand, data can be obtained through commercial software platforms, which purchase

oil and gas data through some oil and gas companies or specialized data companies. In this study,

data on wells in Alberta SAGD projects are collected from Alberta Energy Regulator and Divestco

GeoCarta. With the acquired UWIs and a template of data attributes, SAGD data are collected

from GeoCarta data explorer and trimmed into the database structure to populate the database.

Page 64: Well Production Prediction and Visualization Using Data ...

54

5.3 The Web GIS user interface for SAGD

The Web GIS user interface consists of four main components: (a) an interactive map representing

the current objects; (b) a status bar indicating the selected map layers and the numbers of selected

and highlighted objects on the map; (c) a table displaying the basic information of current objects;

(d) manipulation tools leading to advanced functions – search, search by location, export, data

visualization and data mining.

Figure 5- 3 represents the complete Web GIS user interface with panels extended for map

manipulation, searching for a particular SAGD project and highlighting wells of interest, an

attribute table of the searched wells (bottom left), map navigation (top left), and manipulation tool

bar (top right). Pointed at one bottom well location is the auxiliary window displaying basic well

information with a button, by clicking which users can access to time series data visualization on

history data.

Figure 5- 3 The Web GIS user interface of the system

Page 65: Well Production Prediction and Visualization Using Data ...

55

One of the assets of a web-based cartographic user interface is the flexibility and

interactivity provided to users. The initial setting of the system is a map filling the whole browser

window with none open auxiliary windows or tabs but only clickable icons placed over the map

for extending the manipulation tool bar and the attribute table. Mouse controls are for spatial

navigation like zooming and panning the web-based map. Users can compile map contents based

on their intentions. On the other hand, the map can be automatically zoomed in to a particular

geographic area where searched wells are located.

As the organization of the three different spatial components of SAGD wells, spatial layers

are used to store the heels, toes and derived lines separately. The derived line connects the heel

and toe locations, standing for the well trajectory. Users can view and manipulate the spatial layers

separately in the system. In terms of the map symbols, drop-shaped markers represent toes, while

relatively small circle markers represent heels, as heels are more aggregated than toes. All the

markers are clickable; by clicking the marker of interest, users can explore the well history with a

series of interactive graphs and charts.

To create interactive spatial queries is one of the basic but most essential applications of a

geographic information system. Two kinds of search, property search and location search, are

realized in the web GIS system. Based on user needs, property search, which is one extendable tab

under the tool bar, can filter wells into a set of wells satisfying the search query on attributes such

as UWI, operating company, and well status. Also, search queries can be composed in the attribute

table. Search by location is another tab under the tool bar. Adjacently located wells can be circled

by a polygon, and wells can be highlighted in blue markers. The data of searched wells is

correspondingly updated in the attribute table where detailed information can be investigated.

Page 66: Well Production Prediction and Visualization Using Data ...

56

5.4 The data visualization and data mining user interface

This section presents the features of the data visualization and data mining functions additional to

the Web GIS user interface. How data visualization can help users explore SAGD operation history

data is illustrated with examples and case studies.

5.4.1 Data visualization

Though the collected SAGD data have been archived in the database, there would be missed

information if users simply retrieve and examine data in the database tables. Information graphics

can help users interpret patterns and trends embedded in the datasets. The web-based data

visualization and analytics system implements different visualization methods for users to conduct

comprehensive analyses about the SAGD projects and wells they are investigating. Generally,

users are able to interpret: (a) the history of well status, (b) the injection and production history of

a well pair, and (c) the overall operation of selected wells. The additional value added to the

information graphics in this system is the interactivity in the graphic components. Templates of

data visualization methods are given as follows, as well as a case study on visualizing the time-

series data of a specific well pair.

5.4.1.1 Data visualization templates

When a specific well is referred to on the interactive map, an auxiliary window of visualization on

time series data (i.e. status, injection, production) is displayed. The well statuses are plotted in the

visualization template introduced in Chapter 4.

The injection and production parameters and the corresponding SOR/CSOR provide

engineers the main evidences on decision-making related to operation and oil production.

Interactive graphs on visualizing the time-series data are available in the system. The graph to be

looked at, injection, production or SOR, is chosen on the left side in the auxiliary window. In one

Page 67: Well Production Prediction and Visualization Using Data ...

57

graph, there are four main sections where users can modify to update the graph. The layout of an

example graph is shown in Figure 5- 5(b). Firstly, users can select single or multiple attributes to

be displayed by clicking them from the attribute list in the legend, and there are four types of

charts- area chart, bar chart, line chart and scatterplot- that users can choose. When some data are

missing on some dates, users can choose to just ignore the data or interpolate the data in line chart.

Last but not least, when the users are concerned about any interesting or unusual trend in the graph,

they can zoom in to the corresponding period of time in the time bar below the graph, and then the

graph will be updated accordingly.

If a group of wells is selected in using search by attributes or locations, an overview of the

wells can be presented in the data visualization using bar charts for numerical attributes or pie

charts for categorical ones. Clicking a single bar will display additional information of the

corresponding well in the right side of the visualization window. Also, clicking any of the

selections in the pie chart will highlight the corresponding legend, vice versa. Figure 5- 5(c) and

Figure 5- 5(d) show the examples for bar chart and pie chart visualization. Users can have

straightforward perception about the data distribution from the histogram and pie chart.

Page 68: Well Production Prediction and Visualization Using Data ...

58

Figure 5- 4 Data visualization templates (a) Timeline of a producer well (UWI 02/08-11-

095-06W4/0) in Suncor Firebag (b) Time-series visualization for the producer well in

Suncor Firebag

(a)

(b)

Page 69: Well Production Prediction and Visualization Using Data ...

59

Figure 5- 5 Data visualization templates (c) Bar chart of well total depth for Suncor

Firebag and Husky Tucker wells (d) Pie chart of status for Suncor Firebag wells

(c)

(d)

Page 70: Well Production Prediction and Visualization Using Data ...

60

5.4.1.2 Case study on SAGD injection and production history

The following are the examples of data visualization on a specific well pair (Figure 5- 7). Based

on the trends the graphs implied, it could be concluded that the performance of this well pair has

been moderate and improving.

a) The paired injector well has been injecting steam from May 2011 to August 2013 with a

general increasing trend;

b) The production history of the selected producer well with production-related attributes-

gas, oil and water- are plotted. Staring from October 2011 to August 2013, the oil

production (represented by pink line) has been from about 2000 to 6000 m3 per month,

and reached peaks in April and June 2013. The water flowing out of the producer was about

10000 to 20000 m3 per month, with almost none gas produced;

c) The injected steam, produced oil, SOR and CSOR of the well pair are plotted. Both

injection and production have been in increasing trends;

d) The SOR and CSOR of the well pair are zoomed in to the time period from July 2011 to

July 2012, during this period of time the SOR reached a peak value of 19.94 and reduced

and reached a relatively steady trend to around 3.

Page 71: Well Production Prediction and Visualization Using Data ...

61

Figure 5- 6 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-

095-06W4/0; producer UWI 02/08-11-095-06W4/0) (a) Injection steam history in a bar

chart (b) Produced water, oil and gas in a line chart

(a)

(b)

Page 72: Well Production Prediction and Visualization Using Data ...

62

Figure 5- 7 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-

095-06W4/0; producer UWI 02/08-11-095-06W4/0) (c) Injected steam and produced oil in a

line chart (d) CSOR and SOR in a line chart

(c)

(d)

Page 73: Well Production Prediction and Visualization Using Data ...

63

5.4.2 Data mining

By using data mining techniques, users can discover the hidden patterns in the SAGD wells.

Classification for numerical and categorical attributes, k-means clustering and ARM are

implemented in the web-based GIS system. Moreover, the mapping interface displays the spatial

patterns to not only communicate the mined results but also provide the exploratory capability to

users. The mined patterns associated with wells are shown in the map with an interactive map

legend. The map legend explains the cartographic symbols, and clicking one symbol can result in

the corresponding wells appearing. Case studies are given below in respect with using data mining

tools for different data analysis goals.

Users are allowed to map classified SAGD well attributes. For categorical attributes, like

well current status, well type and pad, wells belonging to different categories are represented by

symbols in different colors. By using categorical classification for well pads, wells in different

pads are displayed in different colors, shown in Figure 5- 9(a). In numerical classification, wells

can be classified either by equal interval or equal quantile for examining the distribution of

attribute values. Figure 5- 9(b) presents an example of quantile classification on average oil

production. Users can observe the distribution of wells in each class, like the aggregation of wells

with high production or low production. Clustering is unsupervised learning from data grouping

similar items and partitioning different ones. Figure 5- 9(c) displays the result of applying k-means

clustering to SOR and oil production. Wells with similar production amount and efficiency are

grouped, and different groups of wells are shown in the map in different symbols.

Page 74: Well Production Prediction and Visualization Using Data ...

64

Figure 5- 8 Examples of categorical, numerical classification and k-means clustering of

wells in Suncor Firebag project (a) Categorical classification of well pads (b) Numerical

classification using quantile of well average oil production

(a)

(b)

Page 75: Well Production Prediction and Visualization Using Data ...

65

Figure 5- 9 Examples of categorical, numerical classification and k-means clustering of

wells in Suncor Firebag project (c) K-means clustering of SOR and oil production

Figure 5- 11 shows an ARM example for Suncor Firebag project mining the relationship between

injection (hour and steam amount) and production (oil production and SOR). There are five

significant rules displayed in the system. For example as shown in Figure 5- 11(a), the first rule

states that if one well has high average injection hour with low standard deviation and low

average injection steam with low standard deviation, the well might has a good SOR average

with low SOR standard deviation. There are 18 out of 20 wells matching the rule. By clicking the

rule in the legend, the map will be updated using three colors (dark blue, light blue and black) to

represent wells matching different parts of the rules, as shown in Figure 5- 11(b). Dark blues

wells satisfy both IF and THEN statements while light blue wells only satisfy IF part but not

THEN part. When the corresponding symbol in the legend is clicked, the wells in the category

will appear with other wells transparently displayed as shown in Figure 5- 11(c). With the

interaction between users and the map, users can explore the mined rules in detail. The mined

rules can be referred to when new wells are to be developed near the existing wells. Engineers

(c)

Page 76: Well Production Prediction and Visualization Using Data ...

66

can make the injection operations and expect the production efficiency from the new wells

according to the rules that their adjacent wells have matched.

Figure 5- 10 ARM example for Suncor Firebag project (a) ARM result map legend (b) Map

interface of association rule 1

(a)

(b)

Page 77: Well Production Prediction and Visualization Using Data ...

67

Figure 5- 11 ARM example for Suncor Firebag project (c) Map interface of wells fully

satisfying association rule 3

(c)

Page 78: Well Production Prediction and Visualization Using Data ...

68

CHAPTER SIX: CONCLUSIONS AND FUTURE WORK

The following two sections respectively draw the conclusions and list the future work.

6.1 Conclusions

This thesis focuses on three aspects on oil and gas data. The first problem focuses on predicting

oil and gas production using time-series production data. Chapter Three has presented a method to

transform time-series well production data into symbolic sequences and further build a symbolic

tree with pre-pruning mechanisms for achieving an informative and predictive model. Firstly,

aggregation method reduces the data dimensionality of the time-series production data, and time-

series symbolization brings efficiency and effectiveness in speeding numeric computation,

handling data noise and preserving production data features. Secondly, the symbolic tree model

fits in chronologically ordered symbol sequences and generates a trimmed tree. Additionally,

different settings in partitioning and data discretizing during the symbolization process can be

evaluated for their effect on building proper-sized symbolic trees by the coverage index. Overall,

an intuitive and practical approach is developed for predicting well production based on production

histories of analogous wells.

The proposed well production prediction approach relies on production data, which are

more accessible than geological/log-derived datasets or well operation and completion datasets.

Users can select the wells for training the predictive model using different criteria. For example,

the wells used to build the symbolic tree may be limited to those in the immediate vicinity, or those

with similar drilling methods and completion histories, or those operated by a particular company.

On the other hand, the symbolic tree model is intuitive for engineers to comprehend and interpret

because the production data for predicting are laid out in a hierarchical tree structure. With an

Page 79: Well Production Prediction and Visualization Using Data ...

69

estimated well performance, engineers can adjust the oil or gas recovery methods to enhance

recovery and make profits. Otherwise, wells are to be remediated to prevent loss.

The second problem that this thesis has tackled is on interactive visualization designs for

well production/injection, well completion and well status data. With their data characteristics

clarified and visualization tasks identified, visualization tools with interactive controls are

designed following Visual Information-Seeking Mantra – “Overview first, zoom and filter, then

details-on-demand” (Shneiderman, 1996). Generally, based on the data characteristics, the proper

simple graph types, such as a multiple-line chart, a bar chart, a timeline and so on, are chosen to

provide overviews of the time-series data. To deliver extra information on demand, interactive

tools, like toggable legends, time range slider and pie chart panel, zoom into more detailed data.

The designed time-series visualization tools accommodate different data characteristics

and user demands, and delivers rich information. The approach of firstly analyzing data

characteristics and user tasks and then designing overview and additional interactive controls has

proved an efficient way for designing visualization tools.

Chapter Five has presented a Web GIS system prototype for mapping SAGD wells in

Alberta, providing access to their proprietary data, visualizing user-selected data and providing

analytical functions. Datasets of Alberta SAGD projects have been collected, archived and

successfully exploited in the web GIS system. Meanwhile, the workflows from the selection of

examined wells to the update of the interactive web-based map, from the selection of information

graphic type to the display of well temporal and attributive data in an auxiliary visualization

window, and from the selection of advanced data mining techniques to the update of mined patterns

in the web map, have been proved feasible.

Page 80: Well Production Prediction and Visualization Using Data ...

70

The most important additional value of this platform is the implementation of data mining

algorithms in the web system. To gain a view of wells falling in different categories, the

classification methods targeted at different data types can be applied to selected well attributes.

Furthermore, the display in the form of maps with symbols in different colors representing

different categories and the assisted interactive legends facilitate a spatial overview of the

classification results for the users. Two data-driven models, k-means clustering and ARM,

generate inherent data patterns regarding similarity and discrepancy in a single attribute or a

combination of attributes, as well as frequent rules in the data. The patterns are also visualized in

the map for users to investigate the spatial distribution of the patterns.

Overall, in this thesis, data mining techniques and interactive visualization tools have

proved to be effective and efficient in delivering information to petroleum professionals. The web-

based system provides a mapping interface for the spatial objects, establishes access to SAGD

data, and includes visualization tools and analytical data mining tools. Therefore, the system

prototype accommodates convenient data access, strong oil and gas data analysis and efficient

information delivery in user-friendly interfaces.

6.2 Future work

The suggestions for future work are listed as follows.

1) Future work of the symbolic tree model firstly includes proposing an incremental updating

algorithm for the updating a symbolic tree model when new wells are added to the tree.

2) During the symbolic tree building process, the current criteria stopping the tree from growing

further are minimum node size and minimum spatial information gain. Other pruning

mechanisms can be integrated to achieve more accurate and compact symbolic trees.

Page 81: Well Production Prediction and Visualization Using Data ...

71

3) The coverage index implies whether the constructed symbolic tree is a proper-sized tree

without branches contributing little information. In order to indicate not only a compact tree

but also a significant and trustworthy predictive model, the approach to combine the prediction

accuracy and the coverage index can be further studied.

4) Besides the interactive data visualization tools designed for exploration and confirmation

analysis purposes, other tools can be designed for representing analysis results, from simple

statistical analyses to complex data mining models.

5) Interactive visualization tools can be designed for other oil and gas data types, such as seismic

data, geological data and so on. For different data types, the proper visualization tools are to

be studied and further developed for specific datasets. For example, three-dimensional

visualization technologies can be employed for spatial datasets.

6) To further develop the Web GIS system, real-time SAGD data can be established. Real-time

data processing and data analysis techniques are to be employed. Moreover, more data mining

techniques like neural networks and outlier detection are to be extended in the system.

7) Oil and gas data are essentially of high volume, variety and velocity. New approaches that

store and analyze big datasets are to be studied and integrated. Cloud computing platforms can

contribute to handling real-time data from multiple sources. With distributed computing

platforms like Hadoop/ MapReduce, computational processes can be significantly accelerated.

Page 82: Well Production Prediction and Visualization Using Data ...

72

APPENDIX: PUBLICATION DURING THE PROGRAM

Published:

Wei, B., Silva, R., & Wang, X. (2015). A web-based steam assisted gravity drainage (SAGD) data

visualization and analytical system. In Web and Wireless Geographical Information

Systems (pp. 89-103). Springer International Publishing.

Submitted:

Wei, B., Pinto, H., & Wang, X. (2016). A symbolic tree model for oil and gas production prediction

using time-series production data. Submitted to 3rd IEEE International Conference on Data

Science and Advanced Analytics.

Page 83: Well Production Prediction and Visualization Using Data ...

73

REFERENCES

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases.

In Proceedings of the 20th international conference on very large data bases, 487-499.

Morgan Kaufmann Publishers Inc..

Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases,

69-84. Springer Berlin Heidelberg.

Aigner, W., Miksch, S., Schumann, H., & Tominski, C. (2011). Visualization of time-oriented

data. Springer Science & Business Media.

Alberta Energy Regulator (AER). (n.d.). Retrieved from http://www.aer.ca. Last Accessed on May

13, 2016.

Alberta Energy. (n.d.). Retrieved from http://www.energy.alberta.ca. Last Accessed on May 13,

2016.

Anderson, D. M., Nobakht, M., Moghadam, S., & Mattar, L. (2010). Analysis of production data

from fractured shale gas wells. In SPE unconventional gas conference. Society of

Petroleum Engineers.

Aulia, A., Keat, T. B., Maulut, M. S., El-Khatib, N., & Jasamai, M. (2010). Smart oilfield data

mining for reservoir analysis. International Journal of Engineering & Technology, 10(06),

78-88.

Baihly, J. D., Altman, R. M., Malpani, R., & Luo, F. (2010). Shale gas production decline trend

comparison over time and basins. In SPE annual technical conference and exhibition.

Society of Petroleum Engineers.

Butler, R. (1998). SAGD comes of age!. Journal of Canadian Petroleum Technology, 37(07).

Page 84: Well Production Prediction and Visualization Using Data ...

74

Cai, Y., Wang, X., Hu, K., & Dong, M. (2014). A data mining approach to finding relationships

between reservoir properties and oil production for CHOPS. Computers & Geosciences,

73, 37-47.

Cenovus. (n.d.). Retrieved from http://www.cenovus.com. Last Accessed on May 13, 2016.

Chan, K. P., & Fu, A. W. C. (1999). Efficient time series matching by wavelets. In Proceedings of

15th International Conference on Data Engineering, 126-133. IEEE.

Claramunt, C. (2005). A spatial form of diversity. In Proceedings of International Conference on

Spatial Information Theory, 2005, 218-231. Springer Berlin Heidelberg.

Divestco. (2016). Retrieved from http://www.divestco.com. Last Accessed on May 13, 2016.

Esmaili, S., & Mohaghegh, S. D. (2013). Using Data-Driven Analytics to Assess the Impact of

Design Parameters on Production from Shale. In SPE Annual Technical Conference and

Exhibition. Society of Petroleum Engineers.

Evans, F., Volz, W., Dorn, G., Fröhlich, B., & Roberts, D. M. (2002). Future trends in oil and gas

visualization. In Proceedings of the conference on Visualization'02, 567-570. IEEE

Computer Society.

Fruhwirth, R. K., Thonhauser, G., & Mathis, W. (2006). Hybrid simulation using neural networks

to predict drilling hydraulics in real time. In SPE Annual Technical Conference and

Exhibition. Society of Petroleum Engineers.

Google. (2016). Retrieved from https://developers.google.com/maps/. Last Accessed on May 13,

2016.

Government of Saskatchewan. (2002). Retrieved from http://www.infomaps.gov.sk.ca. Last

Accessed on May 13, 2016.

Page 85: Well Production Prediction and Visualization Using Data ...

75

Han, J., Kamber, M., & Pei, J. (2001). Data mining: concepts and techniques. Morgan Kaufman,

San Francisco.

Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast

similarity search in large time series databases. Knowledge and information Systems, 3(3),

263-286.

Korn, F., Jagadish, H. V., & Faloutsos, C. (1997). Efficiently supporting ad hoc queries in large

datasets of time sequences. ACM SIGMOD Record, 26(2), 289-300.

Lafollette, R., Holcomb, W. D., & Aragon, J. (2012). Practical data mining: analysis of Barnett

shale production results with emphasis on well completion and fracture stimulation. In SPE

Hydraulic Fracturing Technology Conference. Society of Petroleum Engineers.

Lim, T. S., Loh, W. Y., & Shih, Y. S. (2000). A comparison of prediction accuracy, complexity,

and training time of thirty-three old and new classification algorithms. Machine learning,

40(3), 203-228.

Lin, J., Keogh, E., Wei, L., & Lonardi, S. (2007). Experiencing SAX: a novel symbolic

representation of time series. Data Mining and knowledge discovery, 15(2), 107-144.

Liu, S., & Xue, L. (2008). The Application of Fuzzy Clustering to Oil and Gas Evaluation.

In Proceedings of Fifth International Conference on Fuzzy Systems and Knowledge

Discovery, 644-647. IEEE.

Ma, Z., Leung, J. Y., Zanon, S., & Dzurman, P. (2015). Practical implementation of knowledge-

based approaches for steam-assisted gravity drainage production analysis. Expert Systems

with Applications, 42(21), 7326-7343.

Page 86: Well Production Prediction and Visualization Using Data ...

76

Marroquín, I. D., Brault, J. J., & Hart, B. S. (2008). A visual data-mining methodology for seismic

facies analysis: Part 1—Testing and comparison with other unsupervised clustering

methods. Geophysics, 74(1), 1-11.

McCormick, B. H., DeFanti, T. A., and Brown, M. D. (1987). Visualization in Scientific

Computing. Computer Graphics, 21(6).

Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction.

Machine learning, 4(2), 227-243.

Mohaghegh, S. (2000). Virtual-intelligence applications in petroleum engineering: Part 1—

Artificial neural networks. Journal of Petroleum Technology, 52(9), 64-73.

Mohaghegh, S. D., & Gaskari, R. (2009). An intelligent system’s approach for revitalization of

brown fields using only production rate data. International Journal of Engineering, 22(1),

89-106.

Mohaghegh, S. D., Hutchins, L. A., & Sisk, C. (2008). Building the foundation for Prudhoe Bay

oil production optimisation using neural networks. International Journal of Oil, Gas and

Coal Technology, 1(1-2), 65-80. IEEE.

Node.js. (2016). Retrieved from https://nodejs.org/en/. Last Accessed on May 13, 2016.

Nobakht, M., Mattar, L., Moghadam, S., & Anderson, D. M. (2012). Simplified forecasting of

tight/shale-gas production in linear flow. Journal of Canadian Petroleum Technology,

51(06), 476-486.

PetroFeed Inc. (2015). Retrieved from https://www.petrofeed.com/maps. Last Accessed on May

13, 2016.

PostgreSQL Global Development Group. (2016). Retrieved from https://www.postgresql.org/.

Last Accessed on May 13, 2016.

Page 87: Well Production Prediction and Visualization Using Data ...

77

Serapiao, A., Tavares, R. M., Mendes, J. R. P., & Guilherme, I. R. (2006). Classification of

petroleum well drilling operations using Support Vector Machine (SVM). In Proceedings

of 2006 International Conference on Computational Intelligence for Modelling, Control

and Automation and International Conference on Intelligent Agents, Web Technologies

and Internet Commerce, 145-145. IEEE.

Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information

visualizations. In Proceedings of 1996 IEEE Symposium on Visual Languages, 336-343.

IEEE.

Siirtola, P., Koskimäki, H., Huikari, V., Laurinen, P., & Röning, J. (2011). Improving the

classification accuracy of streaming data using sax similarity features. Pattern Recognition

Letters, 32(13), 1659-1668.

Spence, R. (2007). Information Visualization: Design for Interaction. Prentice-Hall, Inc., Upper

Saddle River, NJ, USA, 2nd edition.

Sproule. (2015). Type curve analysis for landmen. Retrieved from http://landman.ca/wp/wp-

content/uploads/2014/07/Sept-18-2015-Sproule.pdf. Last Accessed on May 13, 2016.

Strecker, U., & Uden, R. (2002). Data mining of 3D poststack seismic attribute volumes using

Kohonen self-organizing maps. The Leading Edge, 21(10), 1032-1037.

Ward, M. O., Grinstein, G., & Keim, D. (2010). Interactive data visualization: foundations,

techniques, and applications. CRC Press.

Zhong, M., Schuetter, J., Mishra, S., & Lafollette, R. F. (2015). Do Data Mining Methods Matter?:

A Wolfcamp Shale Case Study. In SPE Hydraulic Fracturing Technology Conference.

Society of Petroleum Engineers.

Page 88: Well Production Prediction and Visualization Using Data ...

78

Zoumboulakis, M., & Roussos, G. (2011). Complex event detection in extremely resource-

constrained wireless sensor networks. Mobile Networks and Applications, 16(2), 194-213.