Page 1
University of Calgary
PRISM: University of Calgary's Digital Repository
Graduate Studies The Vault: Electronic Theses and Dissertations
2016
Well Production Prediction and Visualization Using
Data Mining and Web GIS
Wei, Bingjie
Wei, B. (2016). Well Production Prediction and Visualization Using Data Mining and Web GIS
(Unpublished master's thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/28686
http://hdl.handle.net/11023/3088
master thesis
University of Calgary graduate students retain copyright ownership and moral rights for their
thesis. You may use this material in any way that is permitted by the Copyright Act or through
licensing that has been assigned to the document. For uses that are not allowable under
copyright legislation or licensing, you are required to seek permission.
Downloaded from PRISM: https://prism.ucalgary.ca
Page 2
UNIVERSITY OF CALGARY
Well Production Prediction and Visualization Using Data Mining and Web GIS
by
Bingjie Wei
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE
DEGREE OF MASTER OF SCIENCE
GRADUATE PROGRAM IN GEOMATICS ENGINEERING
CALGARY, ALBERTA
JUNE, 2016
© Bingjie Wei 2016
Page 3
ii
Abstract
Massive data sets have been accumulated in the oil and gas industry. As strategic assets,
voluminous data of different data types should be leveraged and turned into information for agile
and accurate decision-making. Three oil and gas data-related studies are covered in this thesis.
Firstly, a data-driven model is proposed for predicting well production using time-series
production data from analogous and adjacent wells. Secondly, interactive visualization tools are
designed and implemented for oil and gas spatial and temporal datasets, following an “Overview
first, zoom and filter, then details-on-demand” guideline (Shneiderman, 1996) in order to
maximize information delivery in single displays. Thirdly, a web-based Geographic Information
System (GIS) application is designed and implemented for a Steam Assisted Gravity Drainage
(SAGD) dataset to provide users convenient access to public and proprietary SAGD data, as well
as some data analysis and visualization functions.
Page 4
iii
Acknowledgements
I wish to thank my supervisor, Dr. Xin Wang, for the encouragement, guidance, inspiration and
patience she has provided me throughout my master’s program. I am grateful for the research
opportunities Dr. Wang provided me to work with our industrial partners, Divestco and
Schlumberger. Also, thanks to the Natural Sciences and Engineering Research Council of Canada
(NSERC) for funding the researches.
I would also like to thank several of my colleagues, Rodrigo Silva, Xiaodong Sun, Ge Cui
and Yuanchen Li, for their support and insights. I greatly appreciate the guidance and assistance
from Helen Pinto. Thanks to Helen for the input and sharing of her knowledge in the oil and gas
industry. I am grateful for the precious companionship and encouragement from all my colleagues
and friends inside and outside of school.
I am so thankful for my parents, Zichang Wei and Xiulin Tong, for their unconditional
love, endless patience and incredible support. They have always encouraged and supported my
pursuit of higher education, and taught me to keep a positive perspective on life. I am blessed to
be their daughter.
Page 5
iv
Table of Contents
Abstract ............................................................................................................................... ii
Acknowledgements ............................................................................................................ iii
Table of Contents ............................................................................................................... iv
List of Tables ..................................................................................................................... vi
List of Figures ................................................................................................................... vii
List of Symbols, Abbreviations and Nomenclature ........................................................... ix
CHAPTER ONE: INTRODUCTION ..................................................................................1
1.1 Background ................................................................................................................1
1.2 Problem statement ......................................................................................................2
1.2.1 Oil and gas production prediction .....................................................................3
1.2.2 Interactive data visualization for temporal and spatial oil and gas data ............4
1.2.3 Web GIS system design for oil and gas datasets ...............................................5
1.3 Research objectives ....................................................................................................6
1.4 Research contributions ...............................................................................................6
1.5 Thesis outline .............................................................................................................7
CHAPTER TWO: RELATED WORK ................................................................................9
2.1 Time-series data approximation and symbolization ..................................................9
2.2 Data mining methods ...............................................................................................11
2.2.1 Cluster analysis ................................................................................................11
2.2.2 Association Rule Mining .................................................................................11
2.2.3 Decision tree induction ....................................................................................12
2.3 Oil and gas data visualization ..................................................................................13
2.4 GIS applications in oil and gas ................................................................................14
CHAPTER THREE: A SYMBOLIC TREE MODEL FOR OIL AND GAS PRODUCTION
PREDICTION USING TIME-SERIES PRODUCTION DATA .............................16
3.1 Introduction ..............................................................................................................16
3.2 Symbolic tree construction for well production prediction .....................................17
3.2.1 Time-series well production data preprocessing .............................................18
3.2.2 Symbolic tree construction and evaluation ......................................................21
3.2.2.1 Spatial information gain calculation for symbolic tree nodes ...............24
3.2.2.2 Coverage index for evaluating symbolic trees .......................................26
3.2.3 New well production prediction ......................................................................27
3.2.4 Symbolic Tree Visualization ...........................................................................28
3.3 Case study: Canadian shale gas production prediction ............................................28
CHAPTER FOUR: DATA VISUALIZATION TOOL DESIGNS FOR SPATIAL WELLS
AND TIME-SERIES OIL AND GAS DATA ..........................................................36
4.1 Introduction ..............................................................................................................36
4.2 Oil and gas time-series data characteristics and visualization tasks ........................36
4.2.1 Oil and gas time-series data characteristics .....................................................36
4.2.2 Oil and gas data visualization tasks .................................................................38
4.3 Web GIS interface and visualization and interaction controls ................................39
Page 6
v
4.3.1 Web GIS interface ...........................................................................................40
4.3.2 Injection and production data visualization template ......................................41
4.3.3 Bubble map for visualizing production data of multiple wells .......................43
4.3.4 Completion data visualization template ..........................................................45
4.3.5 Well status data visualization template ...........................................................46
CHAPTER FIVE: A WEB-BASED STEAM ASSISTED GRAVITY DRAINAGE DATA
VISUALIZATION AND ANALYTICAL SYSTEM ..............................................48
5.1 Introduction ..............................................................................................................48
5.2 Design of the SAGD data visualization and analysis system ..................................49
5.2.1 System design ..................................................................................................50
5.2.2 SAGD database ...............................................................................................51
5.2.2.1 Database structure ..................................................................................51
5.2.2.2 Data collection and preprocessing .........................................................53
5.3 The Web GIS user interface for SAGD ...................................................................54
5.4 The data visualization and data mining user interface .............................................56
5.4.1 Data visualization ............................................................................................56
5.4.1.1 Data visualization templates ..................................................................56
5.4.1.2 Case study on SAGD injection and production history .........................60
5.4.2 Data mining .....................................................................................................63
CHAPTER SIX: CONCLUSIONS AND FUTURE WORK ............................................68
6.1 Conclusions ..............................................................................................................68
6.2 Future work ..............................................................................................................70
APPENDIX: PUBLICATION DURING THE PROGRAM .............................................72
REFERENCES ..................................................................................................................73
Page 7
vi
List of Tables
Table 3- 1 Symbols and ranges for time series symbolization of 3 symbols ................................ 30
Table 3- 2 First 12 month monthly well production and symbols for two example wells ........... 30
Table 3- 3 Sensitivity, specificity and coverage calculation for the four symbolic tree
predictive models using different symbol sizes for time-series data symbolization ............. 35
Table 4- 1 Data characteristics of four types of oil and gas time-series data ............................... 38
Page 8
vii
List of Figures
Figure 3- 1 Production data of a new well relative to existing adjacent wells ............................. 16
Figure 3- 2 Flowchart of proposed Symbolic Tree Model ........................................................... 18
Figure 3- 3 Production data aggregation and symbolization (a) Aggregated and symbolized
time-series production data of one example well (b) Data distribution of all aggregated
production data ...................................................................................................................... 20
Figure 3- 4 Basic algorithm for constructing a symbolic tree from well production data ............ 23
Figure 3- 5 Monthly well production and symbol sequences for two example wells .................. 30
Figure 3- 6 Symbolic tree built from symbolic gas production time series using symbol size
of 3 ........................................................................................................................................ 32
Figure 3- 7 Production data preprocessing, symbolic tree construction and well production
prediction .............................................................................................................................. 33
Figure 4- 1 Mapping interface with extended attribute table and information window ............... 41
Figure 4- 2 Interactive chart for viewing production data ............................................................ 42
Figure 4- 3 Bubble map for shale gas wells .................................................................................. 44
Figure 4- 4 Bubble map for shale gas wells with detailed information expended on one
specific well .......................................................................................................................... 44
Figure 4- 5 Bar chart for viewing shale gas completion data ....................................................... 46
Figure 4- 6 Timeline chart for viewing well status data ............................................................... 47
Figure 5- 1 System design of the proposed web-based GIS for a SAGD dataset ......................... 50
Figure 5- 2 SAGD database structure ........................................................................................... 52
Figure 5- 3 The Web GIS user interface of the system ................................................................. 54
Figure 5- 4 Data visualization templates (a) Timeline of a producer well (UWI 02/08-11-095-
06W4/0) in Suncor Firebag (b) Time-series visualization for the producer well in Suncor
Firebag .................................................................................................................................. 58
Figure 5- 4 Data visualization templates (c) Bar chart of well total depth for Suncor Firebag
and Husky Tucker wells (d) Pie chart of status for Suncor Firebag wells ............................ 59
Figure 5- 5 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-
095-06W4/0; producer UWI 02/08-11-095-06W4/0) (a) Injection steam history in a bar
chart (b) Produced water, oil and gas in a line chart ............................................................. 61
Page 9
viii
Figure 5- 5 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-
095-06W4/0; producer UWI 02/08-11-095-06W4/0) (c) Injected steam and produced oil
in a line chart (d) CSOR and SOR in a line chart ................................................................. 62
Figure 5- 6 Examples of categorical, numerical classification and k-means clustering of wells
in Suncor Firebag project (a) Categorical classification of well pads (b) Numerical
classification using quantile of well average oil production ................................................. 64
Figure 5- 6 Examples of categorical, numerical classification and k-means clustering of wells
in Suncor Firebag project (c) K-means clustering of SOR and oil production ..................... 65
Figure 5- 7 ARM example for Suncor Firebag project (a) ARM result map legend (b) Map
interface of association rule 1 ............................................................................................... 66
Figure 5- 7 ARM example for Suncor Firebag project (c) Map interface of wells fully
satisfying association rule 3 .................................................................................................. 67
Page 10
ix
List of Symbols, Abbreviations and Nomenclature
Symbol Definition
Coverage Symbolic tree coverage index
𝑑𝑖𝑖𝑛𝑡 Internal distance
𝑑𝑖𝑒𝑥𝑡 External distance
𝐻 Spatial entropy
𝑚 Aggregation level
𝑛 Original production data length
𝑤 Production symbol sequence length
Abbreviation Definition
ARM Association Rule Mining
DTF Discrete Fourier Transform
DWT Discrete Wavelets Transform
GIS Geographic Information System
PAA Piecewise Aggregation Approximation
SAGD Steam Assisted Gravity Drainage
SAX Symbolic Aggregation Approximation
SVD Singular Value Decomposition
Page 11
1
CHAPTER ONE: INTRODUCTION
1.1 Background
Oil and gas industry has long been dealing with massive data generated during hydrocarbon
exploration, development and production. Acquisition and interpretation of seismic data, followed
by drilling and confirmation of existing oil or gas reserves, can cost millions of dollars. Even then,
it is possible to drill dry holes. In order to make a profit, the accumulative expense of exploration,
well development and production must be minimized to reduce costs. Improving production with
advanced technologies and optimized management and human capital can reduce operating costs
to a certain extent. The most efficient way to reduce costs is to facilitate agile and accurate decision
making to prevent projects from lagging while millions of dollars being wasted. In order to
facilitate fast and accurate decision-making, it is critical to derive information and draw insights
from data.
Data analytics, data visualization and interpretation, and data integration and management
make up the most essential processes to obtain useful information from various oil and gas related
data sources.
The industry is slowly adapting to utilizing data driven models for data analytics in some
specific petroleum applications, such as clustering in seismic attribute analysis (Strecker & Uden,
2002; Marroquin et al., 2009), Artificial Neural Network in reservoir characterization
(Mohaghegh, 2000), Neural Networks and Support Vector Machine in drilling and completion
optimization (Serapiao et al., 2006; Fruhwirth et al., 2006) and so on. Data mining is to reveal
hidden patterns and relationships embedded in big datasets. Besides the datasets used in the
aforementioned studies (i.e. seismic data, reservoir geological and geophysical data, and drilling
data), oil and gas well production data are essentially voluminous. Production data are time-series
Page 12
2
data, including the regularly updated volumes of multiple substances. Production engineers refer
to production data for decision-making concerning production operations.
Data visualization tools communicate information in oil and gas data to engineers and other
petroleum professionals by visual displays. Three-dimensional seismic images provide
information through the subsurface so that geologists can make observations for hydrocarbon
explorations. Updates on different drilling parameters are displayed in overlaid line charts for
drilling engineers to detect underground drilling conditions and progresses. Basic line charts are
used to display time-series production data and injection data providing information on oil and gas
injection and production operations.
Besides temporal data, spatial data are another important part in oil and gas datasets. Oil
and gas wells, pipeline, and environmental sites and facilities are geospatial objects. Geographic
Information System (GIS) technology has been employed in some data management commercial
software, such as Accumap by IHS, GeoCarta by Divestco, and geoSCOUT by geoLOGIS, to
store, map and search/filter these spatial objects. In addition, they provide access to integrated
public and proprietary data in the geophysical, geological and engineering disciplines for user
selected geospatial wells, pipelines or lands. Applications of GIS in the oil and gas industry have
facilitated efficient data management and timely information retrieval.
1.2 Problem statement
The existing researches and applications have laid the groundwork of oil and gas data mining/data
analysis, data visualization, and data management. There are three problems identified respectively
for the three oil and gas data related topics in this thesis: oil and gas production prediction,
interactive visualization tools for spatial and temporal oil and gas data, and Web GIS platform
design.
Page 13
3
1.2.1 Oil and gas production prediction
Hydrocarbon exploration and production is a worldwide industry that is technically challenging
and high-risk in terms of profitability. Well performance prediction starts in the early stages of
production to estimate future recovery, because it is critical for profitmaking. Wells can be
remediated or even shut-in to prevent further loss if their predicted production does not reach a
determined level.
Comparing early stage production data of a new well with historical production data of
adjacent wells is common practice for future production prediction. Type curve matching is an
industry-recognized approach for predicting cumulative production. Analogous wells are first
grouped by the similarity of their cumulative production curves, and the average production
profiles are calculated as the type curves for each group of wells (Mohaghegh & Gaskari, 2009;
Sproule, 2015). The early cumulative production of a new well is then matched to the different
type curves, and the closest curve will be picked by engineers to estimate the future well
performance for the new well. Type curve matching is efficient but subjective. It is difficult to
repeat the analysis because different engineers might provide different estimations for a particular
well based on their experience in visual interpretations of the type curves. A second issue is that
cumulative production curves tend to look quite similar to each other over time, making it
challenging to distinguish performance variations in individual wells.
Research literature shows that data mining techniques are gaining popularity in the oil
industry for production data analysis because the analysis techniques are objective, resistant to
poor data quality, and more accurate than statistical approaches (Mohaghegh et al., 2008; Ma et
al., 2015; Zhong et al. 2015). Different data mining algorithms such as Support Vector Machine,
Random Forests and Boosted Regression Trees have proved to be efficient and effective tools in
Page 14
4
understanding oil and gas operation and production (LaFollette et al. 2012; Esmaili et al. 2013;
Zhong et al. 2015). However, the time-series production data in these studies were either averaged,
or represented by their maximum or cumulative values. These statistical measures oversimplify
the production curves and do not capture the entire production trend of each well. Therefore, an
effective and intuitive data driven prediction approach that uses time-series production data
without oversimplification is to be studied.
1.2.2 Interactive data visualization for temporal and spatial oil and gas data
Along with enormous data and various data analytical tasks in the oil and gas industry, data
visualization has been a useful and popular tool. Users can percept changes in movement, shape,
size, color and texture in an image and further interpret the pieces of information based on their
visual perception (Shneiderman, 1996). However, even though humans have remarkable cognitive
abilities, it would be hard for users to digest all the information from a big and complex dataset.
Especially when the data are multidimensional or multivariate, all aspects of a complex dataset
encoded in a single visual display would make an overwhelming and confusing representation.
Oil and gas time-series datasets are certainly complex taking into account the data types
and sizes. Common oil and gas data include well locations, well affiliation, well status, production
data and completion data. Well locations reflect the geographical coordinates of the surface and
bottom holes of the well, and the field, pool and formation the well belongs to. Well affiliation
data indicate the operator and licensee of the well. Oil and gas wells undergo different stages (e.g.
drilled, active, suspended, abandoned, and whipstocked), and their statuses are updated
accordingly. Production and injection data are updated regularly on multiple substances including
oil, gas, water and other fluid. Well completion processes are taken place to make a well ready for
production. It involves preparing the bottom hole and production tubing to different reservoir
Page 15
5
conditions and specifications, and perforating and stimulating if required for wells to achieve
maximum reservoir contact.
In order to visually represent the different types of oil and gas data, especially the spatial
wells with their attributive data and the time-series data, dividing visualization problems of the
complex dataset into separate facets and building different views that focus on particular
visualization problems would lead to a reasonable and interpretable visualization. Moreover, in
terms of combining the different visual representations for users to mentally interpret the dataset
as a whole picture, interaction is the key. Interactive visualization tools can accommodate both
production trend lines and distribution of gas, oil and water productions over the time and in
different areas. Therefore, in order to deliver rich information intuitively to users, interactive
visualization tools are in need for the oil and gas datasets.
1.2.3 Web GIS system design for oil and gas datasets
GIS technology is commonly used in the industry for mapping and data management purposes.
The aforementioned commercial software products are all built on GIS platforms. However, they
not only have high hardware configuration requirements for installation, but also have sequential
packages to be installed as software modules or database updated. Therefore, it is inconvenient for
users to access the latest software and data, and thus it is a relatively inefficient data distribution
approach.
Web GIS systems, GIS systems built with web technologies, give users access to the
system and the mapping and analytic functionalities as long as they have access to the Internet,
and are approachable by broad audience simultaneously through web browsers. Therefore, the
feasibility of employing a Web GIS platform to map oil and gas related spatial objects and deliver
requested data and information is to be studied. Moreover, integration of the interactive data
Page 16
6
visualization tools and some data mining analytics within the Web GIS system to empower data
analytical functions is to be designed and developed.
1.3 Research objectives
The thesis mainly focuses on data mining, data visualization and data management using GIS on
oil and gas datasets. With respects to the three oil and gas data centered topics, three research
objectives are identified:
1) A data driven approach is to be designed and implemented to predict well production. The
approach relies on production data of the analogous wells to estimate performance of target
wells. Time-series well production data are used for prediction instead of the statistical
measurements calculated from the time-series sequences;
2) The visual and interactive user interfaces and controls are to be designed and implemented for
oil and gas wells, and oil and gas time-series data, including injection, production, completion
and status data;
3) A Web GIS system prototype is to be designed and implemented for mapping oil and gas wells,
providing access to their proprietary data, visualizing user-selected data and providing
analytical functions.
1.4 Research contributions
The contributions of this thesis are listed as follows:
1) The proposed approach for production prediction consists of time-series aggregation and
symbolization steps to reduce dimensionality of the time-series production data and further
transform numerical values to categorical ones. A symbolic tree model with pre-pruning
mechanisms is used to build a predictive model from multiple well production histories, as
Page 17
7
well as a novel tree index is proposed to evaluate the symbolic tree in terms of the tree size.
We conducted an experiment on a well production dataset of shale gas wells in Montney-A
pool in British Columbia and Alberta, Canada to demonstrate the feasibility of the proposed
method;
2) Data characteristics on the different time-series data and the respective user tasks at hand are
analyzed for designing the visual and interactive user interfaces and controls. Following the
classic Visual Information-Seeking Mantra proposed by Shneiderman (1996), templates are
designed for viewing oil and gas time-series data – injection, production, completion and status
data. Additionally, a mapping interface has been designed for displaying spatial wells and
integrating the time-series data viewers. The proposed interactive data visualization templates
are implemented and tested on the Montney shale gas wells;
3) A Web GIS system has been designed and implemented with a Steam Assisted Gravity
Drainage (SAGD) dataset retrieved for some Canadian SAGD projects. Besides mapping
SAGD wells, some data visualization and data analysis functions including clustering and
Association Rule Mining (ARM) in the web system provide further information for users,
proving the Web GIS system an efficient and practical application.
1.5 Thesis outline
Chapter Two introduces the related work on time-series data approximation and symbolization,
data visualization applications of oil and gas data, Web GIS applications, as well as data mining
models including decision tree induction, K-means clustering and ARM. Chapter Three introduces
the proposed well production prediction approach and evaluates the approach with the experiment
on a shale gas production dataset. Chapter Fours analyzes the data characteristics of oil and gas
Page 18
8
time-series data and the visualization tasks, and demonstrates the designed interactive visualization
tools. Chapter Five describes the Web GIS system with data visualization and analytical functions
implemented with a SAGD dataset. In the last chapter, Chapter Six, conclusions are drawn and
further works for this thesis are stated.
Page 19
9
CHAPTER TWO: RELATED WORK
The following sections respectively introduce time-series data approximation and symbolization
methods, data mining models including K-means clustering, ARM and decision tree induction,
data visualization applications and Web GIS applications of oil and gas data.
2.1 Time-series data approximation and symbolization
Oil and gas production data, which are the data resources for production prediction, are time-series
data. The large volume is the barrier for the majority of the time-series data to fit in main computer
memory and normal data mining computing processes, so most time-series mining workflows start
with an approximation step to acquire fewer parameters or time-series data points to approximately
represent the time-series for computing processes afterwards.
Different techniques have been proposed, including major methods like Discrete Fourier
Transform (DTF) (Agrawal et al., 1994), Discrete Wavelets Transform (DWT) (Chan & Fu, 1999),
and Singular Value Decomposition (SVD) (Korn et al., 1997). Since time-series data is high
dimensional, dimensionality reduction techniques are used to reduce the number of descriptive
parameters or data points. Discrete Fourier Transform (DTF) converts a time series into a finite
combination of sine (and/or cosine) waves, isolating the fundamental frequencies present (Agrawal
et al., 1994). Discrete Wavelets Transform (DWT) uses the sum and difference of a mathematical
function localized in discrete periods of time (Chan & Fu, 1999). Singular Value Decomposition
(SVD) globally transforms multiple time-series in the same dataset to a number of eigenvalues
instead of focusing on transforming each time-series (Korn et al., 1997).
Piecewise Aggregation Approximation (PAA) represents time series by the mean values
of equi-length segments of the original sequences (Keogh et al., 2001). It is a more intuitive
dimensionality reduction technique that works with a variety of distance measures, allowing fast
Page 20
10
indexing and querying. The superior performance of PAA has been theoretically proven and
empirically demonstrated by Keogh et al. (2001).
Having introduced the four time-series data approximation techniques for reducing
dimensionality, we will now compare. To reduce data of 𝑛 dimensions, the time complexity of
DFT is O(𝑛2); the time complexity for a DWT computation is O(𝑛); SVD requires O(𝑚𝑛2) time;
and the PAA method requires O(𝑛𝑘) time while there are 𝑘 numbers of equal-length segments.
Therefore, PAA and DWT methods are relatively lower in time complexity. In terms of measuring
distances between different time-series sequences, DFT, DWT and SVD are capable of
approximating only Euclidean distance, while PAA can handle different distance metrics. One
drawback of SVD is that it requires multiple time-series for calculating eigenvalues.
A time series can be converted into symbolic representation by dividing the data into
groups, then summarizing or averaging each group, and assigning a symbol to it. This transforms
the time series into a sequence of symbols. Symbolic representations can effectively preserve
essential data features; are robust in handling data noise; can be used in various data structures;
and generally improve numerical computation speed (Daw et al., 2003). Lin et al. (2007)
developed a technique called Symbolic Aggregation Approximation (SAX) that builds on PAA by
symbolizing the mean values of the time-series segments. A unique feature for SAX is that by
applying PAA and symbolization with equiprobability, it guarantees that the distance measure
between two time-series sequences calculated by SAX lower bounds the true distance (Lin et al.,
2007). SAX has been widely employed in different studies to process real-world sensor data. For
instance, SAX was used to convert sensor data from a Wireless Sensor Network to strings, enabling
the detection of interesting or unusual events in the monitored process (Zoumboulaskis & Roussos,
2011). Siirtola et al. (2011) extracted similarity features from SAX-symbolized time-series data,
Page 21
11
and integrated them with traditional statistical features to improve classification accuracy of
streaming data.
2.2 Data mining methods
Data mining is the process to discover patterns and knowledge from large datasets (Han et al.,
2001). Frequent patterns, associations and correlations mining, classification and clustering are the
most common data mining tasks. Cluster analysis, ARM and decision tree induction are introduced
below as well as three classic algorithms.
2.2.1 Cluster analysis
Cluster analysis, or called clustering, aims at grouping objects with similar properties and also
partition objects with dissimilarity (Han et al., 2001). The consistency of the clustering result of
geological properties and oil and gas resources can assist in oil and gas resource exploration and
evaluation (Liu & Xue, 2008).
K-means is one of the most popular clustering methods. K is a user-defined variable that
stands for the number of clusters or groups. The algorithm initializes k random objects representing
the cluster centroids and iterates the process of assigning other objects to centroids with the closest
distances and calculating new centroids until there is no change in all the clusters. K-means
clustering algorithm can efficiently process large datasets due to its relatively low computation
complexity.
2.2.2 Association Rule Mining
ARM is used to find frequent associations and correlations among different attributes from large
datasets. In gas and oil research field, ARM has been used in reservoir analysis and oil production
(Aulia et al., 2010; Cai et al., 2014). An association rule is comprised of an antecedent part (IF)
and a consequent (THEN) part. Two measures, support and confidence, are used to define rule
Page 22
12
interestingness. An example rule in the paper of Cai et al. (2014) can be described: IF three
reservoir properties match certain levels, THEN the well oil production is high (support = 5.1%,
confidence = 85.7%). Support denotes the proportion of the items in the whole dataset that satisfy
the rule; confidence denotes the proportion of the objects that satisfy the consequence among the
objects satisfying the antecedent condition. Frequent if/then patterns satisfying defined minimum
support and minimum confidence are identified as strong association rules.
Apriori is a classic ARM algorithm using particular searching approach and data structure
to efficiently scan large datasets (Agrawal & Srikant, 1994). An itemset implying a rule includes
a set of items. Firstly, a set of candidate itemsets that include only one item in each itemset is
generated, and then the infrequent itemsets that fail to reach the minimum support count are
excluded. By joining the set of candidate itemsets with itself, new itemsets are generated and then
infrequent ones are pruned. The join and prune processes are iterated till candidate itemsets cannot
be extended anymore, and all frequent itemsets are found. Hash table data structure is used to
improve efficiency. Mined frequent itemsets describe the hidden relationships among multiple
attributes in the datasets and can help with prediction and decision-making.
2.2.3 Decision tree induction
Decision tree is a classic data mining method for classification, consisting of a training step and
then a classification step (Han et al., 2001). Firstly during the training process, attribute-value-
known and class-labeled objects are processed to build a decision tree, where each internal node
in the tree, representative as a testing criteria on an attribute value, splits to two or more branches
representing the test outcomes, and the tree is traversed till a leaf node representing one class label
is reached (Han et al., 2001). After the decision tree is constructed, the class labels of new objects
can be predicted using the tree model given their attribute values.
Page 23
13
Iterative Dichotomiser 3 (ID3) is one of the most classic decision tree methods. It uses
information gain as the attribute selection measure for selecting the attributes for internal tree
nodes. To build a decision tree, it is essential to select the splitting criteria at each hierarchical tree
level. If one attribute can split the instance space with all the objects into multiple sub-spaces and
each subspace is associated with one class, this attribute makes a pure partition. The conceptually
ideal splitting criteria would generate mostly pure partitions. To measure the goodness of attributes
in terms of their partition purity, one of the most popular attribute selection measures is information
gain. The more information gain one attribute brings, the more purity the partition is.
Compared to neural network algorithms and statistical models, decision trees are more
interpretable because the acquired knowledge for predicting is in a readable form (Lim et al.,
2000). Expert knowledge is easy to be integrated into decision tree building, because the internal
nodes can be viewed and altered. Besides interpretability, once the decision tree is trained, it
facilitates fast and accurate predictions because attributes that contribute little information or cause
overfitting are inherently excluded during the tree construction and pruning processes.
2.3 Oil and gas data visualization
The current commercial oil and gas data management systems focus on data management, but have
limited data analysis functionality. In terms of analyzing the large quantities of data in the oil and
gas industry, visualization tools and other digital techniques have helped with exploring data,
making decisions and improving production (Evans et al., 2002). Visualization methods such as
diagrams, charts, and plots are the most common and straightforward ways to summarize datasets.
Data visualization methods like different plots and charts have been used to study shale gas
production for different shale basins over time (Anderson et al. 2010; Baihly et al. 2010; Nobakht
et al. 2012). Some studies analyzed production trends by integrating transient analysis and plot
Page 24
14
analysis – log-log plot and square root time plot (Anderson et al. 2010; Nobakht et al. 2012). In
the study by Baihly et al. (2010), shale wells in the same basin were grouped by the years of their
first production, and line charts were used to plot the average daily gas production of different well
groups; the average production plots were compared over basins. The clear distinction in the
production across different shale basins was attributed to the differences in reservoir properties
and completion processes. Therefore, production analysis using data visualization tools is practical
for characterizing wells and evaluating well performances for a single well or a group of wells.
2.4 GIS applications in oil and gas
One of the most popular commercial products for oil and gas data management in Alberta is
GeoCarta (Divestco, 2016). It is primarily an oil and gas data warehouse integrating locations and
distributions of spatial objects, exploration and production histories and all the other relational data
sources. A mapping interface and a connected data management system work interactively in order
to simplify the workflows of querying and retrieving data. For this specific software, ArcMap is
utilized as the GIS platform, and spatial objects can be either located in the intuitive mapping
interface or searched by the industry standard location descriptions in the data management
system, which is attached to ArcMap as an extension tool. Therefore, ArcGIS Desktop is required
to be installed with GeoCarta; users need to manually update the oil and gas database as GeoCarta
updates it regularly. All the other oil and gas data management software has a similar system
design except for using the other GIS platform instead of ArcMap.
There are also oil and gas information systems built by web GIS technology focusing on
spatial data query (Government of Saskatchewan, 2002), oil and gas industry news notification
(PetroFeed Inc., 2015) and so on, which deliver valuable petroleum related information to users.
Web GIS platforms provide convenient access to rich and real-time datasets through the Internet
Page 25
15
using different mobile devices, enabling broad users to share and use the online data resources.
Users can easily search and retrieve data and information through user-friendly mapping
interfaces, without having to be highly trained to fully understand every functions. Moreover, other
web services provide extra geospatial data analytics and visualization functionalities to web GIS
platforms and make web GIS more integrated and powerful.
Page 26
16
CHAPTER THREE: A SYMBOLIC TREE MODEL FOR OIL AND GAS PRODUCTION
PREDICTION USING TIME-SERIES PRODUCTION DATA
3.1 Introduction
Comparing early stage production data of a new well with historical production data of adjacent
wells is common practice for future production prediction. Oil and gas production volumes are
essentially time-series data updated at a fixed rate, for instance, hourly, daily or monthly. In Figure
3- 1, the dashed black line represents the monthly production of a newly developed gas well that
has only been producing for three months. To predict its future production, the monthly production
curves of 30 adjacent gas wells from the same geological area are shown in the red solid lines. The
production durations of these wells range from 32–155 months. Obviously, it is very difficult to
compare the production curves by visual inspection alone.
Figure 3- 1 Production data of a new well relative to existing adjacent wells
In this chapter, we propose a symbolic tree model to predict future production performance
of an early stage well using the production histories of the surrounding wells. A novel workflow
Page 27
17
is proposed to summarize this historic production data into a hierarchical tree structure. Pre-
pruning mechanisms are integrated, and a new coverage index is designed to achieve a compact
and informative tree. To demonstrate the feasibility of the proposed method we conduct an
experiment on a production dataset from shale gas wells in Montney-A pool in Canada. The
resulting symbolic tree is visually intuitive, and provides accurate predictions of future
performance on holdout wells.
3.2 Symbolic tree construction for well production prediction
To predict future performance of a newly developed well at a specific point in time, the following
two datasets are required:
1) The time-series production history of the new well;
2) Production histories from multiple analogous and adjacent wells.
Figure 3- 2 illustrates the general procedures of building the symbol tree model from time-
series production datasets. The first step is to apply a time-series aggregation method to reduce the
dimensionality of production data, followed by a symbolization method to transform the
numerical, aggregated values to categorical data. User-specified settings for aggregation level and
symbol size are required to perform this conversion. Aggregation level determines to what extent
the original time-series data are compressed, while the symbol size determines the number of
different symbols that will be used to represent the aggregated production data. The result is a data
set containing symbolized sequences of aggregated production data. The second step in this
workflow assists the user in selecting appropriate initial settings by building a series of symbolic
tree candidates with different aggregation levels and symbol numbers. A coverage index is
calculated to select a proper-sized symbolic tree. Finally, the target well performance can be
predicted in the third step, using the selected symbolic tree model.
Page 28
18
The following sections provide a detailed description of this workflow.
Figure 3- 2 Flowchart of proposed Symbolic Tree Model
3.2.1 Time-series well production data preprocessing
As mentioned, time-series production data could be voluminous over the years. As an intuitive and
efficient method to reduce time-series data dimensionality, PAA method is extended to transform
the well production data into time-series sequences with fewer data points. First, the aggregation
level needs to be specified by the user. It indicates how many original production data points
should be combined, which decides the length of each aggregated production history. The length
of aggregated production data can be calculated using Equation (1):
𝑤 = {𝑛/𝑚, 𝑛 𝑚𝑜𝑑 𝑚 = 0
⌊𝑛
𝑚⌋ + 1, 𝑛 𝑚𝑜𝑑 𝑚 ≠ 0
(1)
where 𝑛 is the number of the production data points in the original production sequence, and 𝑚 is
the specified aggregation level. If the original data length (𝑛) is divisible by the aggregation level
(𝑚), the aggregated data length is the quotient of 𝑛/𝑚. If not, the aggregated data length is the
Page 29
19
floor of 𝑛/𝑚 plus one, where the last one aggregated value represents the remaining production
data.
To downsize each time-series production sequence with 𝑛 production data points to 𝑤 data
points, every data segment of 𝑚 data points is represented by the average value. Equation (2)
shows the calculation for the aggregated production time-series when 𝑛 is divisible by the
aggregation level 𝑚.
𝑝�̅� =1
𝑚∑ 𝑝𝑗
𝑚∗𝑖
𝑗=𝑚∗(𝑖−1)+1
(2)
where 𝑝�̅� represents the 𝑖-th value in the transformed 𝑤-dimensional sequence (�̅� = 𝑝1̅̅̅, … , 𝑝𝑤̅̅̅̅ ),
and 𝑝𝑗 represents the 𝑗-th values in the original production data.
When 𝑛 is not divisible by 𝑤, the first 𝑤-1 data points in the aggregated production
sequence are calculated using Equation (2). The last data point is the average of the left-over
production data.
After reducing the dimensionality, the aggregated time-series production data are still
continuous-valued. So a discretion technique needs to be applied to transform the data into
categorical values. Then the next step is to symbolize each data point in the aggregated time-series
sequence based on the symbol set. The symbol set S can be defined by users, denoted as 𝑆 =
{𝑠1, 𝑠2, … , 𝑠𝑘}, where each element in the symbol set represents a symbol and 𝑘 is the symbol size.
First, the aggregated production data need to be sorted in an ascending order. Then, based on
symbol size, multiple quantiles are calculated along the production data range to divide the
numeric production data values into equal sized groups. Statistically, quantiles are the data points
that divide the range of a probability distribution into intervals with equal probability. If the symbol
Page 30
20
size is 3, two quantile values will be calculated to divide all the production values into three groups.
After deciding the equal-probable groups, the corresponding symbols are assigned to the values in
each group.
Figure 3- 3 gives an example how the original monthly production for a well is transformed
with the aggregation level as 3 and symbol size as 4 (the symbol set S= {A, B, C and D}).
Figure 3- 3(a) shows the monthly production data (in solid red line) of a well from January
2010 to September 2012. It is a time-series data with 33 readings. Production data is aggregated
every three-month and shown as the black line. For example, the average production of January
2010—April 2010 is calculated using Equation (2) and the average production is 1821.6. Similarly,
the rest of the aggregated production data values are shown in the graph.
Figure 3- 3(b) displays the distribution of the aggregated well production data of all the
wells used for prediction. Three quantiles, 1500, 2250, and 2950, divide the production data range
of 0 to 4500 into four equi-probable groups, which are respectively assigned four symbols of A,
B, C and D. Based on the symbols with their corresponding data ranges, the production sequence
is transformed into a sequence of chronologically organized symbols, as marked on top of the
black line in Figure 3- 3(a).
Figure 3- 3 Production data aggregation and symbolization (a) Aggregated and symbolized
time-series production data of one example well (b) Data distribution of all aggregated
production data
(a) (b)
Page 31
21
3.2.2 Symbolic tree construction and evaluation
To summarize the production data in a predictive model, the symbolic tree is proposed and built
on the symbolized production data of all the wells and their classes. The symbolic tree is a
flowchart-like tree structure, where the hierarchical levels in the tree correspond to chronological
time periods and the tree nodes represent production symbols at particular times.
Each symbolic tree node contains a number of wells with their production symbol
sequences stored and their class labels known. The class labels correspond to certain well
performance criterion. For example, a binary criterion could be whether a well can reach 10,000-
m3 gas production by the end of the first year. If a criterion refers to four different gas production
ranges (i.e. 0- 5,000-m3, 5,000- 10,000-m3, 10,000-15,000-m3, and 15,000- 20,000-m3), four class
labels are created based on the four production ranges. The leaf nodes provide well classes based
on majority voting, while internal symbolic tree nodes provide the probabilities of wells falling in
different classes.
The symbol tree is constructed in a top-down manner. Since the well production symbols
are chronologically ordered, the hierarchical tree nodes in the symbolic tree will follow the same
order in constructing the tree. In an abnormal situation where all the wells belong to the same class,
the symbolic tree will only have a root node that includes all the well data. If the wells belong to
different classes, the nodes in the first level of the tree represent the distinct symbols of the first
chronological time period in all the well production symbol sequences. Each well is then assigned
to its corresponding tree node. Next, the well production sequences in each first-level tree node
are distributed into child nodes in the second tree level, based on the distinct symbols of their
second chronological time period. The nodes in the following levels grow in the same manner.
Page 32
22
If wells have been producing for a long time, the symbolic tree constructed could grow
very deep. In addition, the production volumes tend to stabilize after a certain period time, which
does not add more variation in the future production and provides little of additional information
for prediction. A minimum node size threshold is introduced to prevent unnecessary branching
by ensuring that each node contains no fewer than the specified number of wells. Spatial
correlation may also exist in the petrophysical attributes of oil and gas wells from the same
formation (porosity, permeability and oil saturation). This means that adjacent wells, especially
wells produce in the same pool, should have similar production trends. Therefore, the spatial
distribution of wells should contribute to the production prediction. To account for spatial
correlation, a second threshold called minimum spatial information gain, is introduced. Spatial
information gain is a measure of node purity that combines information gain with spatial
correlation. Minimum spatial information gain ensures that a node does not split unless the gain
in purity exceeds the specified threshold.
The basic algorithm of symbolic tree construction is described in Figure 3- 4. The algorithm
takes the well production data, the production symbol sequence length, symbol set, as well as two
other user specified parameters – minimum node size and minimum spatial information gain- as
inputs. The symbolic tree starts with a root node containing the production symbols and well
classes (line 1). If all the wells in the root node belong to the same class, the node is marked as a
leaf node with that class (line 2). Otherwise, the root node with a tree level of 0 is passed to the
function called Generate_tree_node (line 4). The defined function takes a current node and a tree
level as arguments, and it is a recursive function. From the current node, a child node 𝑁𝑖 is
generated at the next tree level corresponding to each symbol in the symbol set, and retrieves and
stores the wells that have the symbol of si at time period of tree_level+1 (line 8-10). If the tree
Page 33
23
level does not reach the length of the production sequence, node size and spatial information gain
are calculated for the node 𝑁𝑖 (line 12). The node will be truncated if its size is zero, marked as a
leaf node if it does not reach either the minimum node size or the minimum spatial information
gain, or passed to the Generate_tree_node function with an incremented tree level (line 13-15). If
the tree level equals the length of the production sequence, the node will be marked a leaf node,
and the class is calculated on majority voting. After the entire symbol tree levels in the symbolic
tree are constructed, the resulting symbolic tree is returned.
Figure 3- 4 Basic algorithm for constructing a symbolic tree from well production data
Page 34
24
The node size corresponds to the number of wells included in one node. The calculation is
straightforward. The spatial information gain calculation will be given in the next section.
3.2.2.1 Spatial information gain calculation for symbolic tree nodes
Spatial correlations are commonly observed in geological properties and production performance
among wells, especially wells producing from the same reservoir. In the other words, if two wells
are closer to each other, the higher probability the production performance of the wells are similar.
Spatial entropy is an information measure integrated the influence of spatial distribution on the
non-spatial attributes of spatial objects. The spatial entropy calculation presented by Claramunt
(2005) is employed in this study because it is intuitive and easy to calculate.
First of all, given a well production dataset 𝐷 = {𝐷1, 𝐷2, … , 𝐷𝑝}, where wells are classified
into p different classes, two average distance measures – intra-distance and extra-distance – are
defined. The intra-distance, denoted by 𝑑𝑖𝑖𝑛𝑡, is the average distance between all wells in the class
of 𝐷𝑖. The extra-distance, denoted by 𝑑𝑖𝑒𝑥𝑡, is the average distance of wells in 𝐷𝑖 to all other wells
in other classes of 𝐷. In Equation (3), when 𝐷𝑖 is empty or only contains one well, the intra-distance
is assigned a constant value α to avoid the interference of null values during the computation. In
Equation (4), when 𝐷𝑖 includes all the wells in 𝐷, that is 𝐷𝑖 is the only class, the extra-distance is
assigned a large constant 𝛽. 𝑑𝑖𝑠𝑡(𝑎, 𝑏) is the distance between wells 𝑎 and 𝑏.
𝑑𝑖𝑖𝑛𝑡 = {
1
|𝐷𝑖|×|𝐷𝑖−1|∑ ∑ 𝑑𝑖𝑠𝑡(𝑎, 𝑏)
|𝐷𝑖|𝑏=1,𝑏≠𝑎,𝑏∈𝐷𝑖
|𝐷𝑖|𝑎=1,𝑎∈𝐷𝑖
, |𝐷𝑖| > 1
𝛼, |𝐷𝑖| ≤ 1 (3)
𝑑𝑖𝑒𝑥𝑡 = {
1
|𝐷𝑖|×|𝐷−𝐷𝑖|∑ ∑ 𝑑𝑖𝑠𝑡(𝑎, 𝑏)
|𝐷−𝐷𝑖|𝑏=1,𝑏≠𝑎,𝑏∉𝐷𝑖
|𝐷𝑖|𝑎=1,𝑎∈𝐷𝑖
, 𝐷 ≠ 𝐷𝑖
𝛽, 𝐷 = 𝐷𝑖
(4)
Page 35
25
Definition 1. The spatial entropy of dataset 𝐷 based on its partition {𝐷1, 𝐷2, … , 𝐷𝑝} is defined as
(from Claramunt (2005)):
𝐻(𝐷) = − ∑𝑑𝑖
𝑖𝑛𝑡
𝑑𝑖𝑒𝑥𝑡 𝑝(𝐷𝑖)
𝑝
𝑖=1
𝑙𝑜𝑔2𝑝(𝐷𝑖) (5)
In this definition, 𝐻(𝐷) is the spatial entropy, which is calculated by including a weight
factor 𝑑𝑖𝑖𝑛𝑡/𝑑𝑖
𝑒𝑥𝑡 in the Shannon entropy formula. The weight factor decreases when either the
intra-distance decreases or the extra-distance increases, so that the spatial entropy measure takes
into account the spatial distribution of the objects, which in this study are oil and gas wells.
Spatial Entropy is a special form of Shannon entropy. It has been known that Shannon entropy of
an even distribution of objects reaches the maximum value and tends to decrease as the
concentration of the distribution increases. Therefore, spatial entropy is a monotonic decreasing
function for local non-spatial attribute similarity and spatial correlation.
Spatial information gain calculates the difference between the spatial entropy values
between the parent node and a child node. It uses as one of the criteria to decide whether a
symbolic tree node should grow child tree nodes at the next tree level. This measure stands for the
reduction of spatial entropy by further growing tree nodes using production symbols at the
sequential time period. To determine the spatial information gain from splitting the current node
𝑁, the spatial entropy values are calculated respectively before and after potentially generating
child nodes from the current node.
Before generating child nodes, spatial entropy of 𝑁 is the average amount of information
needed to identify the class label of a well in 𝑁. Equation (5) is used to calculate the spatial entropy
before generating child nodes from node 𝑁. The well dataset within node 𝑁 is denoted as 𝐷, and
Page 36
26
there are 𝑝 different classes. The log function in the formula is taken as logarithm to the base 2,
because information is encoded in bits. After generating child nodes from node 𝑁, the well
production symbols at the sequential time interval are provided. Equation (6) calculates the amount
of spatial entropy provided by the well symbol information and well class data. 𝑚 is the symbol
size. At node 𝑁, for all the wells that have one particular production symbol are denoted by 𝐷𝑗 ,
and the spatial entropy is denoted by 𝐻(𝐷𝑗). 𝐷𝑗
𝐷 represents the percentage of wells in 𝐷𝑗 out of all
wells in 𝐷. The total spatial entropy is measure by adding up 𝐷𝑗
𝐷× 𝐻(𝐷𝑗) for all production
symbols.
𝐻𝑁(𝐷) = ∑𝐷𝑗
𝐷
𝑚
𝑗=1
× 𝐻(𝐷𝑗) (6)
The spatial information gain is defined by subtracting 𝐻𝑁(𝐷) from 𝐻(𝐷). It represents
how much spatial entropy is reduced by growing child nodes to the next tree level from this node.
Therefore, if the spatial information gain is less than the minimum spatial information gain
threshold, the node is restrained from generating child nodes.
3.2.2.2 Coverage index for evaluating symbolic trees
For time-series well production data preprocessed using different aggregation level and symbol
size settings, indicated by users, the resulting symbolic trees have different structures. From the
point of a decent symbolic tree, tree size is one of the most important criteria. Generally, a compact
tree would be more favorable than a larger, more complex tree. Theoretically, complexity reduces
the predictive ability of a tree on independent datasets, even though the tree explains the training
data well (Mingers, 1989). A large tree tends to be overfitting. Realistically, it would be difficult
for engineers to visually interpret a complex symbolic tree and derive information about well
Page 37
27
production from the tree structure. Therefore, the size of the tree should be minimized to provide
a compact symbolic tree. Furthermore, a compact tree is easier for users to interpret and analyze
the production histories of all the wells.
In this study, a size-based index 𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 is proposed to evaluate the symbolic tree
derived from symbolic time-series production data. As Equation (7) shows, 𝑥𝑖 stands for the
number of branches at each level from the tree, 𝑠 stands for the symbol size used for representing
production data, and 𝑛 is the tree depth.
𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 =∑ 𝑥𝑖
𝑛𝑖=1
∑ 𝑠𝑗𝑛𝑗=1
(7)
The numerator of the index represents the total number of occupied branches in the
symbolic tree, while the denominator corresponds to the total number of possible branches in a
fully-grown tree with a symbol size of 𝑠. The smaller the coverage index value, the more compact
the symbolic tree is. Users can calculate coverage index values for multiple symbolic trees built
from production data sequences created using different combinations of aggregation level and
symbol size; and then select the tree with the lowest coverage index.
3.2.3 New well production prediction
The symbolic tree model is constructed for the purpose of predicting new well performance. For a
new well with limited production history, the aggregation and symbolization processes first
transform its time-series production data points into a production symbol sequence. Then, the
production symbol sequence of the new well is matched onto the symbolic tree. If this sequence
ends at a leaf node, the new well is labeled as the majority class in the leaf node. If it ends at a
non-leaf node, it can imply the possibility of the well belonging to different class labels based on
the well class information embedded in the node.
Page 38
28
The following section applies the proposed symbolic tree method to a real dataset of shale
gas wells to predict shale gas production after 12 months.
3.2.4 Symbolic Tree Visualization
A visualization template is designed for users to investigate the symbolic tree model in an
interactive and intuitive manner. Different colors are assigned for branches and nodes for better
visualization. In order to present the pruned trees with comparison to fully expanded trees, the
branches of the pruned trees are colored in red, while the uncovered branches in full trees are
colored in grey. The thickness of the branches indicates the number of wells included in the end
nodes of the branches. A thicker branch means more wells, while a thinner branch represents a
smaller number of wells. Leaf nodes are colored differently due to the different classes they
represent. An example will be shown in the case study section.
The following section applies the proposed symbolic tree method to a real dataset of shale
gas wells to predict shale gas production after 12 months.
3.3 Case study: Canadian shale gas production prediction
The majority of the shale gas resources in Canada reside within the Western Canada Sedimentary
Basin (WCSB), which extends across British Columbia, Alberta, Saskatchewan and Manitoba.
Shale gas drilling and production are mainly occurring in British Columbia and Alberta at present.
The proposed symbolic tree model can assist engineers with decision-makings regarding the
following three questions.
1) Can we predict if a well will reach certain production at a benchmark time?
2) How early can we accurately predict new well prediction?
3) Where are the analogous wells with a similar production trend located?
Page 39
29
Montney play that is located in British Columbia and Alberta is considered as the most
active drilling area in Canada. 972 wells in Montney–A pool that have been producing for more
than one year were selected for this case study. The proposed symbolic tree model can help
engineers predict at a production early stage if a shale well will reach 10,000 m3 gas production
cumulatively within the first 12 months. The wells were classified into one of two groups, wells
with more than 10,000 m3 within one year (Class: Y) and wells that did not reach that much
production (Class: N). Also, the wells sharing similar production trends can be identified on a map.
In the experiment, the whole well dataset is first divided into a training dataset and a testing
dataset. The well data were collected from the three data sources: (1) British Columbia (BC) Oil
and Gas Commission for BC shale gas well identifiers and locations; (2) Alberta Energy Regulator
for Alberta shale gas wells; (3) Divestco GeoCarta (a commercial software) for production data.
The training dataset contains 80% wells randomly chosen from three individual fields, Dahl Field,
Heritage Field and Northern Montney Field in Montney-A pool. The testing dataset, on the other
hand, contains all the remaining 20% wells. Both training and testing datasets include wells from
three fields.
First, the production data of all wells are symbolized. The production data are aggregated
monthly, and then four different symbol sizes (i.e. 3, 4, 5, and 6) with equiprobable data discretion
are applied to symbolize the production data. For example, for symbol size of 3, two 3-quantile
data points (i.e. 336.55 and 1127.4) were calculated from all production values in the training
production dataset so that the whole dataset could be cut into three equal-sized groups. The
symbols corresponding to the three data ranges are listed in Table 3- 1. Based on the calculated
data ranges for the three different symbols, the 12-month production data sequences are
Page 40
30
represented by sequences of 12 symbols. Table 3- 2 and Figure 3- 5 show the first 12-month
production of two wells and the corresponding symbols.
Figure 3- 5 Monthly well production and symbol sequences for two example wells
Table 3- 1 Symbols and ranges for time series symbolization of 3 symbols
A B C
0 – 336.55 336.55 – 1127.4 >1127.4
Table 3- 2 First 12 month monthly well production and symbols for two example wells
UWI: 00/B-014-J/094-B-16/0
Accumulative gas: 14493 m3 Class: Y
2056.8 2084.9 1926.8 1540.3 1266.3 1113.4
C C C C C B
919 881.6 790.4 708.2 599.3 606
B B B B B B
UWI: 00/05-05-078-17W6/0
Accumulative gas: 7793.9 m3 Class: N
854.9 683.6 38.9 906.9 1003.7 818.2
B B A B B B
744 609.8 552.3 558.6 497 526
B B B B B B
Page 41
31
Figure 3- 6 shows the symbolic tree generated by using a symbol size of 3 and pre-pruning
thresholds of minimum 10 wells per node and minimum spatial information gain of 0.1. In the
visualized tree model, as shown in Figure 3- 6, the leaf nodes are colored either in blue or orange.
A blue node represents that the wells in this leaf node are going to reach the expected production
by the end of the first year, while an orange node represents that the wells will fail to reach the
expected production amount.
As shown in the figure, the depth of the symbolic tree is 5. It implies that whether the well
can achieve more than 10,000 m3 gas production cumulatively within 12 months can be estimated
by referring to the well production within the first five months. On the other hand, at each
hierarchical level of the symbolic tree, the branches from left to right correspond to gas production
from low to high. The rightmost branches at the first tree level in the symbolic trees lead to leaf
nodes representing wells that reached the expected performance. Therefore, it implies that the gas
wells that had relatively high well production (>1127.4 m3/month) on the first month are highly
likely to have good production performances and thus reach the expected production amount by
the end of the first year.
Page 42
32
Figure 3- 6 Symbolic tree built from symbolic gas production time series using symbol size
of 3
Figure 3- 7 shows the example of the general process of production data preprocessing and
symbolic tree construction. The production histories of the 777 wells distributed in Montney-A
pool were retrieved and plotted in the line chart at the upper right. It is difficult to visually interpret
whether there are any trends in the production from the line chart, as there are too many lines and
the lines are overlapping. Then, the production data were aggregated monthly and symbolized with
a symbol size of 3, as shown in the table. A symbolic tree was constructed on the production
symbol sequences. Taking the tree nodes representing a production sequence of “B C C” for
example, this trace ends at the third tree level and stops at a blue leaf node, which means a new
well with “B C C” as the production symbols for the first three months is predicted to be able to
achieve more than 10,000 m3 cumulative gas production within 12 months. In the bottom-right
line chart, the red lines represent the production histories of the 116 wells falling in the black tree
Page 43
33
branches, and the blue markers in the map show the spatial distribution of these wells. There are
only 2 wells out of the 116 ones did not reach 10,000 cumulative gas production within 12 months,
so this leaf node represents a class of ‘Y’. The dashed black line represent the well which was
predicted to be able to reach the production threshold because its first 3-month production symbols
match “B C C”.
Figure 3- 7 Production data preprocessing, symbolic tree construction and well production
prediction
Coverage index can help users to determine the parameter settings for the symbol size. In
this case study, coverage indexes were calculated for the symbolic trees constructed from
production data symbolized with different symbol sizes. We continued constructing symbol tress
with symbol sizes as 4, 5, and 6, as well accuracy assessments were conducted. The coverage
values for the symbol sizes of 3, 4, 5 and 6 are respectively 17.9%, 20.9%, 13.5%, and 8.2%, as
listed in Table 3- 3. This means that by trimming the original symbolic trees using minimum node
size and minimum spatial information gain the tree sizes are significantly reduced.
Page 44
34
The trees could expend to 12 hierarchical levels because the dataset includes well
production of 12 months. However, by pre-pruning the symbolic trees with minimum node size of
10 and minimum spatial information gain of 0.1, the depth of tree is significantly reduced to 4 or
5.
The testing dataset was used to assess the symbolic trees built for predicting whether the
wells could reach an expected production at the benchmark time. Three measurements were
calculated for evaluating the symbolic trees: sensitivity, specificity and accuracy. Wells that
reached 10,000 m3 gas production cumulatively within 12 months are referred as having good
performances, and wells that failed this expectation are with poor performances. Sensitivity
measures the true positive rate, which is the proportion of the wells with good performances that
are correctly identified as having reached the production expectation. Specificity measures the true
negative rate, which is the proportion of the wells with poor performances that are correctly
identified as having not reached the expected production. Accuracy measures the proportions of
the correctly classified wells, with either good or poor performances, out of all wells.
As shown in Table 3- 3, the four symbolic tree models have reached accuracy above 0.9,
which means over 90% of the wells have been correctly classified in terms of their ability to reach
10,000 m3 gas production by the end of first year. The sensitivity values are all higher than the
specificity. Therefore, the four models are more accurate in predicting well performed wells than
poorly performed wells. Overall, the symbolic tree built from the production data symbolized into
3 levels has the best accuracy.
Page 45
35
Table 3- 3 Sensitivity, specificity and coverage calculation for the four symbolic tree
predictive models using different symbol sizes for time-series data symbolization
Symbol size Coverage Sensitivity Specificity Accuracy
3 0.1791 0.9688 0.8806 0.9325
4 0.2088 0.9583 0.8657 0.9202
5 0.1346 0.9479 0.8806 0.9202
6 0.0824 0.9375 0.8656 0.9080
Page 46
36
CHAPTER FOUR: DATA VISUALIZATION TOOL DESIGNS FOR SPATIAL WELLS
AND TIME-SERIES OIL AND GAS DATA
4.1 Introduction
Oil and gas data is a strategic asset in the industry. Visualization tools assist petroleum
professionals with data comprehension, information seeking and decision-making. However,
considering the variety and volume of oil and gas data, the information that simple graphics can
deliver is limited.
Time-series data and spatial data are two essential data types in the oil and gas dataset. A
system with interactive visualization tools integrated in a Web GIS platform is designed to flexibly
deliver maximum information in single visual representations. This sections below introduce the
data characteristics of different time-series data, the visualization tasks, the Web GIS interface
design and the visual tool designs.
4.2 Oil and gas time-series data characteristics and visualization tasks
4.2.1 Oil and gas time-series data characteristics
To start a visualization process, according to Aigner et al. (2011), the data and the task at hand are
the two aspects above all else that should be taken account for. In terms of the oil and gas data that
have been retrieved from the public domain, associated with each well, there are textual and
numerical data on well identification, name and affiliations, spatial data indicating the surface and
bottom well locations, and time-series data about well status, well completion, injection and
production. Time-series data make up an important part of oil and gas data, and the updates on
well injection and production are what engineers refer to for decision-makings during production
process.
Page 47
37
The data types of the non-time-series data are straightforward, while the time-series data
have varied characteristics for different themes. To characterize the time-series data, four criteria
are used, where three criteria (i.e. scale of variables, dimensionality, and frame of reference) are
extracted from fundamental design alternatives for time-oriented data provided by Aigner et al.
(2011). The definitions of the different data characteristics under the oil and gas time-series data
context are given below.
Scale of variables (quantitative vs. qualitative): quantitative variables have numeric data
values, while qualitative variables have numerical or ordinal data. Oil and gas time-series data
that are in number form are quantitative, while those with categorical data are qualitative.
Number of variables (univariate vs. multivariate): a univariate time-series records the changes
of one oil and gas related variable over time, while a multivariate time-series consists of
multiple synchronously recorded streams.
Frequency of update (uniformly-sampled vs. irregularly-sampled): uniformly-sampled time-
series is regularly updated, for instances, every second, daily, monthly, or annually, while some
time-series updates occasionally. Irregularly-sampled data can be seen as events or changes of
states.
Frame of reference (abstract vs. spatial): by spatial data we mean the events or states taking
place in different spatial locations along time while abstract data do not include the spatial
aspect. In this case, oil and gas time-series data are associated with individual wells. Data that
indicate events or states associated with different locations along one wellbore are spatial,
while data that generally belong to one well are defined as abstract time-series.
Page 48
38
Table 4- 1 shows the characteristics of four types of oil and gas time-series data. Well status
data include the statuses (e.g. well authorization granted, cased, active production, and suspended)
one well has been undergoing, and each status state is related to time periods of different lengths.
Well status data indicate changes of well state. Well completion data include the completions (e.g.
open hole, fracture, and perforate) that take place at different depth intervals along the wellbore.
The time aspect of completion data includes the date when one completion event happens. Well
status and well completion are both qualitative, univariate and irregularly-sample. However, the
well completion data include the spatial perspective because well completions happen in different
locations along the well trajectory. Well injection and well production data are both quantitative,
multivariate, uniformly-sampled and abstract, because for each well multiple instances (e.g. water,
steam, and gas) are injected and produced and the volumes of the injection and production are
updated regularly.
Table 4- 1 Data characteristics of four types of oil and gas time-series data
Scale of
variables
Number of
variables
Frequency of
update
Frame of
reference
Well status qualitative univariate irregularly-sampled abstract
Well completion qualitative univariate irregularly-sampled spatial
Well injection quantitative multivariate uniformly-sampled abstract
Well production quantitative multivariate uniformly-sampled abstract
4.2.2 Oil and gas data visualization tasks
With the data characteristics clarified for different oil and gas time-series data, the next step is to
list the visualization tasks. There are three basic purposes that users usually have through data
visualization: exploration, information/knowledge confirmation, and analyzed result presentation
(Ward et al., 2010).
Page 49
39
Based on the discussion with some petroleum engineers, some points were made on what
aspects in the time-series data the engineers are hoping to explore. As for the well status, engineers
want to look into the general lifecycle and the durations of all the statuses each well has been
undergoing. In terms of the completion data, engineers to want to examine the completion
processes of each well by looking at the completion types and the corresponding temporal and
spatial information. The injection or production history which involves multiple variables (i.e.
water, oil, gas, other fluids, and hours) is to be explored for injection or production trends and
abruptions. Also, the distribution of oil, gas and water production at different time stamps is of
interest to engineers.
In terms of confirmative analysis, hypotheses are to be proved. The completion operations
during hydraulic fracturing taking place adjacent to one well will impact production of the well is
the hypothesis to be proved. Hydraulic fracturing is an oil/gas recovery technique that has enabled
the large-scale commercial production of shale gas and oil, by hydraulically pressurizing fluid to
fracture rocks and release natural gas and oil to the wellbores.
4.3 Web GIS interface and visualization and interaction controls
After studying the data types, data characteristics and the visualization tasks for the available oil
and gas data, user interfaces built with visualization and interaction controls are to be designed to
achieve maximum information delivered. The Visual Information-Seeking Mantra proposed by
Shneiderman (1996) has been a well acknowledged starting point or the general guideline for user
interface designs. It points out that the user experiences with an interface always follows a process
– “Overview first, zoom and filter, then details-on-demand”. This process is iterated as users
explore into more detailed information. Therefore, the user interface designs aim to provide an
Page 50
40
overview of the data objects on the first page load, interaction controls to accentuate and filter
data, and further visual presentations to demonstrate details.
4.3.1 Web GIS interface
Oil and gas wells are real-world objects and static spatial objects. GIS technology is employed to
provide users the overview of the oil and gas well locations/distributions and basic well
information. The initial main interface displays wells in red Google Maps markers on a map, which
occupies the whole page, to give users an overview of the well distribution, as shown in Figure 4-
1. Additionally, a button at the bottom of the map opens an attribute table listing the non-time-
series properties of the wells.
The basic information of each well is shown in the information window by single clicking
on the well. By double click on the well, it is selected and highlighted in blue. Besides the general
well information shown in the information window, two buttons will open visualization panels
respectively for production and completion data of the selected well. Moreover, holding the shift
key and pressing the left mouse button can draw a polygon box to select and highlight multiple
wells on the map. Clicking the button in the middle bottom will show the attribute table of the
highlighted wells. All the highlighted wells are correspondingly selected in the table. Table
manipulations like selecting/unselecting, sorting, filtering are enabled.
Page 51
41
Figure 4- 1 Mapping interface with extended attribute table and information window
4.3.2 Injection and production data visualization template
At an overview level, a multi-series line chart is used to demonstrate the trends of the volumes of
different substance production/injection substances and production/injection hours. The horizontal
axis represents the time from the beginning of production/injection to the time of the last data
record linearly. Additionally, for users to explore and target specific data to investigate, one
interaction tool is added to reduce the dimensions of displayed production/injection data. A
toggable legend is employed so that users can filter out the items in the legend and the
corresponding line will disappear or reappear in the multi-series line chart. In Figure 4- 2, only
gas, hours and water legends are toggled on, so the line chart only displays the lines of the three
items. Moreover, users can zoom in to smaller time ranges for details by indicating a time range
in the time range slider below the main line chart canvas. As shown in Figure 4- 2, the dark grey
area in the time range slider shows the chosen time range from 2008 to 2012.
Open attribute table button
Table manipulation buttons
Information window
Page 52
42
As analyzed in the user task, the distribution of oil, gas and water at a certain time would
be of interest to engineers. To model a chosen month along the continuous time, two interaction
controls, time slider and indicator, are employed. A time slider on the line chart canvas functions
as the link between a certain month and the whole timeline. Since the time slider moves
continuously, the time indicator is used to indicate the exact month where the time slider falls in.
According to the indicated month, a pie chart is updated to display the gas, oil and water
distributions at that time. Figure 4- 2 shows the layout of the two charts. As the mouse moves over
the line chart, the time slider slides, the pie chart updates, the time indicated at the top right corner
of the line chart canvas (June 2008 in this case), and the specific numbers update beside the legend
items. The detailed data are dynamically updated with movements of the time slider.
Figure 4- 2 Interactive chart for viewing production data
Time indicator Time slider
Legend
Time range slider
Page 53
43
4.3.3 Bubble map for visualizing production data of multiple wells
If the scale of wells that users look into is a group of wells instead of a single one, another
visualization tool named bubble map can be used to display the productions. Bubble map is a
design of thematic map that focuses on displaying the distributions of gas, oil and water production
of different wells within a specific geographic area at a specific time. The distributions of gas, oil
and water production can be compared over different wells at the same time. Also, for one or
multiple wells, the changes of distributions can be viewed by comparing different bubble maps
created for different times.
As shown in Figure 4- 3, the two-tier visualization panel is comprised of a map at the top
and a multi-line chart at bottom. At the overview level, pie charts are mapped to show the spatial
distribution of the wells, and the line chart shows the production histories on gas, oil and water.
Each pie chart is centered at the coordinate of the shale well, showing the distribution of gas, oil
and water productions of the well. The multi-series lines display the total productions of gas, oil
and water by all the wells shown in the map above. Users can easily observe the wells with high
percentages of water production due to the distinction in colors representing different substances.
In terms of zooming into certain time stamp and updating the map, the time slider can be moved
to update the pie charts so that users can detect the changes in the gas, oil and water distribution
of wells of interest. In order to show more detailed information on demand, hovering on one single
well the user is interested in will make the rest wells transparent, and clicking on the well can lead
to some specific information, as shown in Figure 4- 4.
Page 54
44
Figure 4- 3 Bubble map for shale gas wells
Figure 4- 4 Bubble map for shale gas wells with detailed information expended on one
specific well
Time slider
Page 55
45
4.3.4 Completion data visualization template
As for completion data, which are irregularly sampled to indicate completion events happening
from the start of exploiting and operating a well. A continuous-scaled linear time model is used to
represent the timeline of a well, and multiple dates along the time correspond to the completion
dates. In terms of the completion data, they are categorical and associated with one-dimensional
spatial information – depth. Therefore, a horizontal axis is used to represent the time, a vertical
axis represents depth and different colored bars display the completion types. A bar whose length
is mapped to the vertical axis represents the completion that takes place within a certain depth
range. The values in the y-axis increase downwards as the sub-surface depth increases.
As shown in Figure 4- 5, the depth and time about where and when completions like
perforation, fracturing and open holes that occur within one kilometer to a target shale gas well
are demonstrated in the bar chart. To filter out certain completion types, the corresponding legend
items are to be toggled so that users can look into only completions of interest. Moreover, by
clicking on the button in the legend for the production line, the production history will be added
to the bar chart. With the production history and completion activities shown in the same plot, the
changes of the production can be compared with completion data. Same as the production data
visualization panel, the time range slider zooms the main bar chart into the time period so that
users can obtain detailed information.
Additionally, detailed information associate with a single completion include the date,
completion type, length and the formation(s) where the completion took place. In order to reach
the details, users can place the mouse over a specific bar in the bar chart.
Page 56
46
Figure 4- 5 Bar chart for viewing shale gas completion data
4.3.5 Well status data visualization template
As for well status, which is a univariate time-series attribute, a simple timeline chart is used to
represent the status history of one well, with differently colored bars along the timeline
representing the past and current statuses. There is a legend on the left side of the status data
visualization panel showing all possible statuses. When the mouse is moved onto certain status
along the timeline, the corresponding status legend will be highlighted with other legends faded,
and the duration will show at the left side of the panel. By working with the interactive timeline,
the user can not only know the chronological processes the well has been undergoing but also
detailed periods of time according to each operation. As shown in Figure 4- 6, the light blue legend
for Drilled and cased is highlighted, and it shows this well was under this status from May 27 2007
to May 16 2011.
Show production line
Wells within 1
kilometers of the
Page 57
47
Figure 4- 6 Timeline chart for viewing well status data
Page 58
48
CHAPTER FIVE: A WEB-BASED STEAM ASSISTED GRAVITY DRAINAGE DATA
VISUALIZATION AND ANALYTICAL SYSTEM
5.1 Introduction
In this chapter, a data visualization and analysis system built with Web GIS technology is proposed
and implemented with data from SAGD projects.
In the late 1970s, steam-based in-situ process, SAGD, was developed and introduced as an
oil recovery technology for abundant Canadian heavy oil and bitumen (oil sands) (Butler, 2013).
SAGD employs a horizontal well pair configuration with an upper injection well and a lower
production well drilled in parallel. High-temperature steam is injected through an injector to heat
up the reservoir and form a chamber, and then the heated oil bitumen at the chamber edge will
drain down and flow through the producer (Alberta Energy Regulator, n.d.). Now SAGD is being
widely used as a thermal production technology to extract oil bitumen from Alberta’s subsurface
oil sands deposits. Projects using SAGD technology are being more common: the number of
commercial SAGD projects in Alberta has reached 16 by 2013, compared to less than 5 before
year 2000 (Alberta Energy, n.d.).
As the expansion of SAGD projects, huge and ever-growing quantities of SAGD-related
data have been accumulated, involving various domains oil and gas industry could interfere with
- generally like geophysics, geology, petroleum, business and administration. Applications
assisting in storing and managing the voluminous and complex SAGD datasets are in demand.
Targeted at SAGD data, a data application should be able to accommodate the spatial
characteristics of SAGD wells, provide users with access to integrative SAGD-related data and
append spatial exploration and analysis functionalities.
Page 59
49
The contributions of this application are as follows. Firstly, by integrating GIS, a GIS
mapping interface and the database management system can work interactively as users explore
the SAGD spatial and attributive data. Different spatial layers and flexible spatial queries can help
users efficiently target spatial SAGD wells and then apply the visualization and analysis functions
to the wells. Secondly, the web GIS platform is approachable by broad audience. Users can access
the GIS system and make use of the mapping and analytic functionalities through web browsers.
Thirdly, public and proprietary SAGD data are collected, and archived in a specially designed
database. Intuitive and interactive data visualization methods like attribute table, histograms and
time-series data viewer, as well as data mining techniques like clustering and ARM are
implemented in the system for users to further comprehend SAGD data and make decisions.
The remainder of the chapter is organized as follows. The second section introduces the
web-based system structure and the database design. The third section presents the Web GIS user
interface, while the fourth section focuses on the data visualization and data mining functionalities.
5.2 Design of the SAGD data visualization and analysis system
A system with an integrated modular base needs to preserve a GIS environment focusing on SAGD
wells and adopts new designs and implementations to perform the following functions: (a) provide
a Web GIS platform, making the system accessible to users through web browsers; (b) render
archived SAGD data searchable by locations or attributes, and searched results exportable; (c)
visualize attributive and time-series data in forms of tables, interactive charts and graphs; (d) apply
clustering and ARM techniques and visualize mined spatial patterns in the interface. This section
introduces the system design and presents the employed technologies and the SAGD database.
Page 60
50
5.2.1 System design
The system design is illustrated in Figure 5- 1. The web-based application consists of four main
components: the SAGD database, the data processing server, the web server and the user interface.
The four components communicate, and deliver and present users information according to their
requests.
Figure 5- 1 System design of the proposed web-based GIS for a SAGD dataset
This system is built upon HTML5 and CSS3, which respectively structure and style
webpages, and can flexibly modify and adjust webpage elements. JavaScript as an object-oriented
programming language is used in developing the system, since it can be executed on the client side
to avoid excessive communication with the web server and reduce the processing time. Moreover,
there is a rich amount of third party JavaScript libraries, plugins and modules that can be used to
accelerate the development. The other important technologies and open source libraries that have
been used in this system include PostgreSQL, the Google Maps API and Node.js.
Clustering library
Page 61
51
The web server was developed using Node.js. Node.js is a JavaScript runtime that can run
in different operational systems and optimizes the scalability of the input/output processes (Node.js
Foundation, 2016). PostgreSQL is an open-source object-relational database that can run on all
major operational systems (PostgreSQL Global Development Group, 2016). It is a well-
maintained, powerful and reliable open source database. Google Maps API provides the web
mapping service, which include mapping spatial wells, spatial selection as well as some map
controls, such as zoom controls, view controls and so on (Google, 2016).
5.2.2 SAGD database
5.2.2.1 Database structure
Besides the huge quantity of data in an on-progress SAGD project, the data types can be varied,
as there are static and dynamic data, numerical and categorical data, and first-hand and derived
data. The dynamic injection and production processes in SAGD operations generate time-series
data on the injected and produced instances and the amounts. As for the geospatial characteristics
of SAGD wells, multiple SAGD well pairs (injector and producer pairs) are drilled in units of well
pads; the surface locations of the horizontal wells (heels) are aggregated in the pad centers, while
horizontal wells spread out underground and reach the end locations (toes). A relational database
is designed and deployed with integration of basic well information, geographic coordinates, well
status, and injection and production records. Figure 5- 2 shows an overview of the database
structure.
Page 62
52
Figure 5- 2 SAGD database structure
The Unique Well Identifier (UWI) is a standard well identification containing 16 characters
in four components sequentially (Cenovus, n.d.). The primary purpose of UWI is to differentiate
every single well. Therefore, in the database, a primary key is designed to correspond to UWI in
the root table and also associate to primary keys in other tables.
The well status, injection and production histories of the SAGD wells are stored in three
separate tables. The well status indicates the general phases with the corresponding periods of time
that an individual well has undergone, e.g. observation, drilled and cased, abandoned. The injection
Page 63
53
or production table stores the well monthly records. For a producer, there might be some injection
during the start-up for warming-up purpose; there might be small amounts of produced substances
from an injector as well.
A key SAGD performance measurement, Steam Oil Ratio (SOR) calculates the amount of
steam used for producing a barrel of oil (Cenovus, n.d.). Small SOR rates like around 2 represent
efficient SAGD operations. Cumulative Steam Oil Ratio (CSOR) is the accumulative amount of
steam divided by accumulative produced oil, which measures the efficiency since the well pair has
been operated. In the database management system, SOR and CSOR are calculated for each well
pair. Statistical measurements like minimum, maximum, average and standard deviation are
calculated for SOR and CSOR as well as injected steam, operation hours and produced oil amount,
and stored in the statistics table.
5.2.2.2 Data collection and preprocessing
Considering the quantity of data that SAGD projects could have been generating, public available
data are limited and scattered. Some SAGD in-situ and surface facilities collect real-time data,
which are compiled and only distributed within the organizations. Through Alberta Energy
Regulator, annual reports on in situ performance of each SAGD projects are accessible. In the
annual reports, summary information on geology and geophysics, drilling and well
instrumentation, seismic, and operation performance are displayed in forms of maps and graphs.
On the other hand, data can be obtained through commercial software platforms, which purchase
oil and gas data through some oil and gas companies or specialized data companies. In this study,
data on wells in Alberta SAGD projects are collected from Alberta Energy Regulator and Divestco
GeoCarta. With the acquired UWIs and a template of data attributes, SAGD data are collected
from GeoCarta data explorer and trimmed into the database structure to populate the database.
Page 64
54
5.3 The Web GIS user interface for SAGD
The Web GIS user interface consists of four main components: (a) an interactive map representing
the current objects; (b) a status bar indicating the selected map layers and the numbers of selected
and highlighted objects on the map; (c) a table displaying the basic information of current objects;
(d) manipulation tools leading to advanced functions – search, search by location, export, data
visualization and data mining.
Figure 5- 3 represents the complete Web GIS user interface with panels extended for map
manipulation, searching for a particular SAGD project and highlighting wells of interest, an
attribute table of the searched wells (bottom left), map navigation (top left), and manipulation tool
bar (top right). Pointed at one bottom well location is the auxiliary window displaying basic well
information with a button, by clicking which users can access to time series data visualization on
history data.
Figure 5- 3 The Web GIS user interface of the system
Page 65
55
One of the assets of a web-based cartographic user interface is the flexibility and
interactivity provided to users. The initial setting of the system is a map filling the whole browser
window with none open auxiliary windows or tabs but only clickable icons placed over the map
for extending the manipulation tool bar and the attribute table. Mouse controls are for spatial
navigation like zooming and panning the web-based map. Users can compile map contents based
on their intentions. On the other hand, the map can be automatically zoomed in to a particular
geographic area where searched wells are located.
As the organization of the three different spatial components of SAGD wells, spatial layers
are used to store the heels, toes and derived lines separately. The derived line connects the heel
and toe locations, standing for the well trajectory. Users can view and manipulate the spatial layers
separately in the system. In terms of the map symbols, drop-shaped markers represent toes, while
relatively small circle markers represent heels, as heels are more aggregated than toes. All the
markers are clickable; by clicking the marker of interest, users can explore the well history with a
series of interactive graphs and charts.
To create interactive spatial queries is one of the basic but most essential applications of a
geographic information system. Two kinds of search, property search and location search, are
realized in the web GIS system. Based on user needs, property search, which is one extendable tab
under the tool bar, can filter wells into a set of wells satisfying the search query on attributes such
as UWI, operating company, and well status. Also, search queries can be composed in the attribute
table. Search by location is another tab under the tool bar. Adjacently located wells can be circled
by a polygon, and wells can be highlighted in blue markers. The data of searched wells is
correspondingly updated in the attribute table where detailed information can be investigated.
Page 66
56
5.4 The data visualization and data mining user interface
This section presents the features of the data visualization and data mining functions additional to
the Web GIS user interface. How data visualization can help users explore SAGD operation history
data is illustrated with examples and case studies.
5.4.1 Data visualization
Though the collected SAGD data have been archived in the database, there would be missed
information if users simply retrieve and examine data in the database tables. Information graphics
can help users interpret patterns and trends embedded in the datasets. The web-based data
visualization and analytics system implements different visualization methods for users to conduct
comprehensive analyses about the SAGD projects and wells they are investigating. Generally,
users are able to interpret: (a) the history of well status, (b) the injection and production history of
a well pair, and (c) the overall operation of selected wells. The additional value added to the
information graphics in this system is the interactivity in the graphic components. Templates of
data visualization methods are given as follows, as well as a case study on visualizing the time-
series data of a specific well pair.
5.4.1.1 Data visualization templates
When a specific well is referred to on the interactive map, an auxiliary window of visualization on
time series data (i.e. status, injection, production) is displayed. The well statuses are plotted in the
visualization template introduced in Chapter 4.
The injection and production parameters and the corresponding SOR/CSOR provide
engineers the main evidences on decision-making related to operation and oil production.
Interactive graphs on visualizing the time-series data are available in the system. The graph to be
looked at, injection, production or SOR, is chosen on the left side in the auxiliary window. In one
Page 67
57
graph, there are four main sections where users can modify to update the graph. The layout of an
example graph is shown in Figure 5- 5(b). Firstly, users can select single or multiple attributes to
be displayed by clicking them from the attribute list in the legend, and there are four types of
charts- area chart, bar chart, line chart and scatterplot- that users can choose. When some data are
missing on some dates, users can choose to just ignore the data or interpolate the data in line chart.
Last but not least, when the users are concerned about any interesting or unusual trend in the graph,
they can zoom in to the corresponding period of time in the time bar below the graph, and then the
graph will be updated accordingly.
If a group of wells is selected in using search by attributes or locations, an overview of the
wells can be presented in the data visualization using bar charts for numerical attributes or pie
charts for categorical ones. Clicking a single bar will display additional information of the
corresponding well in the right side of the visualization window. Also, clicking any of the
selections in the pie chart will highlight the corresponding legend, vice versa. Figure 5- 5(c) and
Figure 5- 5(d) show the examples for bar chart and pie chart visualization. Users can have
straightforward perception about the data distribution from the histogram and pie chart.
Page 68
58
Figure 5- 4 Data visualization templates (a) Timeline of a producer well (UWI 02/08-11-
095-06W4/0) in Suncor Firebag (b) Time-series visualization for the producer well in
Suncor Firebag
(a)
(b)
Page 69
59
Figure 5- 5 Data visualization templates (c) Bar chart of well total depth for Suncor
Firebag and Husky Tucker wells (d) Pie chart of status for Suncor Firebag wells
(c)
(d)
Page 70
60
5.4.1.2 Case study on SAGD injection and production history
The following are the examples of data visualization on a specific well pair (Figure 5- 7). Based
on the trends the graphs implied, it could be concluded that the performance of this well pair has
been moderate and improving.
a) The paired injector well has been injecting steam from May 2011 to August 2013 with a
general increasing trend;
b) The production history of the selected producer well with production-related attributes-
gas, oil and water- are plotted. Staring from October 2011 to August 2013, the oil
production (represented by pink line) has been from about 2000 to 6000 m3 per month,
and reached peaks in April and June 2013. The water flowing out of the producer was about
10000 to 20000 m3 per month, with almost none gas produced;
c) The injected steam, produced oil, SOR and CSOR of the well pair are plotted. Both
injection and production have been in increasing trends;
d) The SOR and CSOR of the well pair are zoomed in to the time period from July 2011 to
July 2012, during this period of time the SOR reached a peak value of 19.94 and reduced
and reached a relatively steady trend to around 3.
Page 71
61
Figure 5- 6 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-
095-06W4/0; producer UWI 02/08-11-095-06W4/0) (a) Injection steam history in a bar
chart (b) Produced water, oil and gas in a line chart
(a)
(b)
Page 72
62
Figure 5- 7 Time-series data visualization on a specific well pair (Injector UWI 05/08-11-
095-06W4/0; producer UWI 02/08-11-095-06W4/0) (c) Injected steam and produced oil in a
line chart (d) CSOR and SOR in a line chart
(c)
(d)
Page 73
63
5.4.2 Data mining
By using data mining techniques, users can discover the hidden patterns in the SAGD wells.
Classification for numerical and categorical attributes, k-means clustering and ARM are
implemented in the web-based GIS system. Moreover, the mapping interface displays the spatial
patterns to not only communicate the mined results but also provide the exploratory capability to
users. The mined patterns associated with wells are shown in the map with an interactive map
legend. The map legend explains the cartographic symbols, and clicking one symbol can result in
the corresponding wells appearing. Case studies are given below in respect with using data mining
tools for different data analysis goals.
Users are allowed to map classified SAGD well attributes. For categorical attributes, like
well current status, well type and pad, wells belonging to different categories are represented by
symbols in different colors. By using categorical classification for well pads, wells in different
pads are displayed in different colors, shown in Figure 5- 9(a). In numerical classification, wells
can be classified either by equal interval or equal quantile for examining the distribution of
attribute values. Figure 5- 9(b) presents an example of quantile classification on average oil
production. Users can observe the distribution of wells in each class, like the aggregation of wells
with high production or low production. Clustering is unsupervised learning from data grouping
similar items and partitioning different ones. Figure 5- 9(c) displays the result of applying k-means
clustering to SOR and oil production. Wells with similar production amount and efficiency are
grouped, and different groups of wells are shown in the map in different symbols.
Page 74
64
Figure 5- 8 Examples of categorical, numerical classification and k-means clustering of
wells in Suncor Firebag project (a) Categorical classification of well pads (b) Numerical
classification using quantile of well average oil production
(a)
(b)
Page 75
65
Figure 5- 9 Examples of categorical, numerical classification and k-means clustering of
wells in Suncor Firebag project (c) K-means clustering of SOR and oil production
Figure 5- 11 shows an ARM example for Suncor Firebag project mining the relationship between
injection (hour and steam amount) and production (oil production and SOR). There are five
significant rules displayed in the system. For example as shown in Figure 5- 11(a), the first rule
states that if one well has high average injection hour with low standard deviation and low
average injection steam with low standard deviation, the well might has a good SOR average
with low SOR standard deviation. There are 18 out of 20 wells matching the rule. By clicking the
rule in the legend, the map will be updated using three colors (dark blue, light blue and black) to
represent wells matching different parts of the rules, as shown in Figure 5- 11(b). Dark blues
wells satisfy both IF and THEN statements while light blue wells only satisfy IF part but not
THEN part. When the corresponding symbol in the legend is clicked, the wells in the category
will appear with other wells transparently displayed as shown in Figure 5- 11(c). With the
interaction between users and the map, users can explore the mined rules in detail. The mined
rules can be referred to when new wells are to be developed near the existing wells. Engineers
(c)
Page 76
66
can make the injection operations and expect the production efficiency from the new wells
according to the rules that their adjacent wells have matched.
Figure 5- 10 ARM example for Suncor Firebag project (a) ARM result map legend (b) Map
interface of association rule 1
(a)
(b)
Page 77
67
Figure 5- 11 ARM example for Suncor Firebag project (c) Map interface of wells fully
satisfying association rule 3
(c)
Page 78
68
CHAPTER SIX: CONCLUSIONS AND FUTURE WORK
The following two sections respectively draw the conclusions and list the future work.
6.1 Conclusions
This thesis focuses on three aspects on oil and gas data. The first problem focuses on predicting
oil and gas production using time-series production data. Chapter Three has presented a method to
transform time-series well production data into symbolic sequences and further build a symbolic
tree with pre-pruning mechanisms for achieving an informative and predictive model. Firstly,
aggregation method reduces the data dimensionality of the time-series production data, and time-
series symbolization brings efficiency and effectiveness in speeding numeric computation,
handling data noise and preserving production data features. Secondly, the symbolic tree model
fits in chronologically ordered symbol sequences and generates a trimmed tree. Additionally,
different settings in partitioning and data discretizing during the symbolization process can be
evaluated for their effect on building proper-sized symbolic trees by the coverage index. Overall,
an intuitive and practical approach is developed for predicting well production based on production
histories of analogous wells.
The proposed well production prediction approach relies on production data, which are
more accessible than geological/log-derived datasets or well operation and completion datasets.
Users can select the wells for training the predictive model using different criteria. For example,
the wells used to build the symbolic tree may be limited to those in the immediate vicinity, or those
with similar drilling methods and completion histories, or those operated by a particular company.
On the other hand, the symbolic tree model is intuitive for engineers to comprehend and interpret
because the production data for predicting are laid out in a hierarchical tree structure. With an
Page 79
69
estimated well performance, engineers can adjust the oil or gas recovery methods to enhance
recovery and make profits. Otherwise, wells are to be remediated to prevent loss.
The second problem that this thesis has tackled is on interactive visualization designs for
well production/injection, well completion and well status data. With their data characteristics
clarified and visualization tasks identified, visualization tools with interactive controls are
designed following Visual Information-Seeking Mantra – “Overview first, zoom and filter, then
details-on-demand” (Shneiderman, 1996). Generally, based on the data characteristics, the proper
simple graph types, such as a multiple-line chart, a bar chart, a timeline and so on, are chosen to
provide overviews of the time-series data. To deliver extra information on demand, interactive
tools, like toggable legends, time range slider and pie chart panel, zoom into more detailed data.
The designed time-series visualization tools accommodate different data characteristics
and user demands, and delivers rich information. The approach of firstly analyzing data
characteristics and user tasks and then designing overview and additional interactive controls has
proved an efficient way for designing visualization tools.
Chapter Five has presented a Web GIS system prototype for mapping SAGD wells in
Alberta, providing access to their proprietary data, visualizing user-selected data and providing
analytical functions. Datasets of Alberta SAGD projects have been collected, archived and
successfully exploited in the web GIS system. Meanwhile, the workflows from the selection of
examined wells to the update of the interactive web-based map, from the selection of information
graphic type to the display of well temporal and attributive data in an auxiliary visualization
window, and from the selection of advanced data mining techniques to the update of mined patterns
in the web map, have been proved feasible.
Page 80
70
The most important additional value of this platform is the implementation of data mining
algorithms in the web system. To gain a view of wells falling in different categories, the
classification methods targeted at different data types can be applied to selected well attributes.
Furthermore, the display in the form of maps with symbols in different colors representing
different categories and the assisted interactive legends facilitate a spatial overview of the
classification results for the users. Two data-driven models, k-means clustering and ARM,
generate inherent data patterns regarding similarity and discrepancy in a single attribute or a
combination of attributes, as well as frequent rules in the data. The patterns are also visualized in
the map for users to investigate the spatial distribution of the patterns.
Overall, in this thesis, data mining techniques and interactive visualization tools have
proved to be effective and efficient in delivering information to petroleum professionals. The web-
based system provides a mapping interface for the spatial objects, establishes access to SAGD
data, and includes visualization tools and analytical data mining tools. Therefore, the system
prototype accommodates convenient data access, strong oil and gas data analysis and efficient
information delivery in user-friendly interfaces.
6.2 Future work
The suggestions for future work are listed as follows.
1) Future work of the symbolic tree model firstly includes proposing an incremental updating
algorithm for the updating a symbolic tree model when new wells are added to the tree.
2) During the symbolic tree building process, the current criteria stopping the tree from growing
further are minimum node size and minimum spatial information gain. Other pruning
mechanisms can be integrated to achieve more accurate and compact symbolic trees.
Page 81
71
3) The coverage index implies whether the constructed symbolic tree is a proper-sized tree
without branches contributing little information. In order to indicate not only a compact tree
but also a significant and trustworthy predictive model, the approach to combine the prediction
accuracy and the coverage index can be further studied.
4) Besides the interactive data visualization tools designed for exploration and confirmation
analysis purposes, other tools can be designed for representing analysis results, from simple
statistical analyses to complex data mining models.
5) Interactive visualization tools can be designed for other oil and gas data types, such as seismic
data, geological data and so on. For different data types, the proper visualization tools are to
be studied and further developed for specific datasets. For example, three-dimensional
visualization technologies can be employed for spatial datasets.
6) To further develop the Web GIS system, real-time SAGD data can be established. Real-time
data processing and data analysis techniques are to be employed. Moreover, more data mining
techniques like neural networks and outlier detection are to be extended in the system.
7) Oil and gas data are essentially of high volume, variety and velocity. New approaches that
store and analyze big datasets are to be studied and integrated. Cloud computing platforms can
contribute to handling real-time data from multiple sources. With distributed computing
platforms like Hadoop/ MapReduce, computational processes can be significantly accelerated.
Page 82
72
APPENDIX: PUBLICATION DURING THE PROGRAM
Published:
Wei, B., Silva, R., & Wang, X. (2015). A web-based steam assisted gravity drainage (SAGD) data
visualization and analytical system. In Web and Wireless Geographical Information
Systems (pp. 89-103). Springer International Publishing.
Submitted:
Wei, B., Pinto, H., & Wang, X. (2016). A symbolic tree model for oil and gas production prediction
using time-series production data. Submitted to 3rd IEEE International Conference on Data
Science and Advanced Analytics.
Page 83
73
REFERENCES
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases.
In Proceedings of the 20th international conference on very large data bases, 487-499.
Morgan Kaufmann Publishers Inc..
Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases,
69-84. Springer Berlin Heidelberg.
Aigner, W., Miksch, S., Schumann, H., & Tominski, C. (2011). Visualization of time-oriented
data. Springer Science & Business Media.
Alberta Energy Regulator (AER). (n.d.). Retrieved from http://www.aer.ca. Last Accessed on May
13, 2016.
Alberta Energy. (n.d.). Retrieved from http://www.energy.alberta.ca. Last Accessed on May 13,
2016.
Anderson, D. M., Nobakht, M., Moghadam, S., & Mattar, L. (2010). Analysis of production data
from fractured shale gas wells. In SPE unconventional gas conference. Society of
Petroleum Engineers.
Aulia, A., Keat, T. B., Maulut, M. S., El-Khatib, N., & Jasamai, M. (2010). Smart oilfield data
mining for reservoir analysis. International Journal of Engineering & Technology, 10(06),
78-88.
Baihly, J. D., Altman, R. M., Malpani, R., & Luo, F. (2010). Shale gas production decline trend
comparison over time and basins. In SPE annual technical conference and exhibition.
Society of Petroleum Engineers.
Butler, R. (1998). SAGD comes of age!. Journal of Canadian Petroleum Technology, 37(07).
Page 84
74
Cai, Y., Wang, X., Hu, K., & Dong, M. (2014). A data mining approach to finding relationships
between reservoir properties and oil production for CHOPS. Computers & Geosciences,
73, 37-47.
Cenovus. (n.d.). Retrieved from http://www.cenovus.com. Last Accessed on May 13, 2016.
Chan, K. P., & Fu, A. W. C. (1999). Efficient time series matching by wavelets. In Proceedings of
15th International Conference on Data Engineering, 126-133. IEEE.
Claramunt, C. (2005). A spatial form of diversity. In Proceedings of International Conference on
Spatial Information Theory, 2005, 218-231. Springer Berlin Heidelberg.
Divestco. (2016). Retrieved from http://www.divestco.com. Last Accessed on May 13, 2016.
Esmaili, S., & Mohaghegh, S. D. (2013). Using Data-Driven Analytics to Assess the Impact of
Design Parameters on Production from Shale. In SPE Annual Technical Conference and
Exhibition. Society of Petroleum Engineers.
Evans, F., Volz, W., Dorn, G., Fröhlich, B., & Roberts, D. M. (2002). Future trends in oil and gas
visualization. In Proceedings of the conference on Visualization'02, 567-570. IEEE
Computer Society.
Fruhwirth, R. K., Thonhauser, G., & Mathis, W. (2006). Hybrid simulation using neural networks
to predict drilling hydraulics in real time. In SPE Annual Technical Conference and
Exhibition. Society of Petroleum Engineers.
Google. (2016). Retrieved from https://developers.google.com/maps/. Last Accessed on May 13,
2016.
Government of Saskatchewan. (2002). Retrieved from http://www.infomaps.gov.sk.ca. Last
Accessed on May 13, 2016.
Page 85
75
Han, J., Kamber, M., & Pei, J. (2001). Data mining: concepts and techniques. Morgan Kaufman,
San Francisco.
Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast
similarity search in large time series databases. Knowledge and information Systems, 3(3),
263-286.
Korn, F., Jagadish, H. V., & Faloutsos, C. (1997). Efficiently supporting ad hoc queries in large
datasets of time sequences. ACM SIGMOD Record, 26(2), 289-300.
Lafollette, R., Holcomb, W. D., & Aragon, J. (2012). Practical data mining: analysis of Barnett
shale production results with emphasis on well completion and fracture stimulation. In SPE
Hydraulic Fracturing Technology Conference. Society of Petroleum Engineers.
Lim, T. S., Loh, W. Y., & Shih, Y. S. (2000). A comparison of prediction accuracy, complexity,
and training time of thirty-three old and new classification algorithms. Machine learning,
40(3), 203-228.
Lin, J., Keogh, E., Wei, L., & Lonardi, S. (2007). Experiencing SAX: a novel symbolic
representation of time series. Data Mining and knowledge discovery, 15(2), 107-144.
Liu, S., & Xue, L. (2008). The Application of Fuzzy Clustering to Oil and Gas Evaluation.
In Proceedings of Fifth International Conference on Fuzzy Systems and Knowledge
Discovery, 644-647. IEEE.
Ma, Z., Leung, J. Y., Zanon, S., & Dzurman, P. (2015). Practical implementation of knowledge-
based approaches for steam-assisted gravity drainage production analysis. Expert Systems
with Applications, 42(21), 7326-7343.
Page 86
76
Marroquín, I. D., Brault, J. J., & Hart, B. S. (2008). A visual data-mining methodology for seismic
facies analysis: Part 1—Testing and comparison with other unsupervised clustering
methods. Geophysics, 74(1), 1-11.
McCormick, B. H., DeFanti, T. A., and Brown, M. D. (1987). Visualization in Scientific
Computing. Computer Graphics, 21(6).
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction.
Machine learning, 4(2), 227-243.
Mohaghegh, S. (2000). Virtual-intelligence applications in petroleum engineering: Part 1—
Artificial neural networks. Journal of Petroleum Technology, 52(9), 64-73.
Mohaghegh, S. D., & Gaskari, R. (2009). An intelligent system’s approach for revitalization of
brown fields using only production rate data. International Journal of Engineering, 22(1),
89-106.
Mohaghegh, S. D., Hutchins, L. A., & Sisk, C. (2008). Building the foundation for Prudhoe Bay
oil production optimisation using neural networks. International Journal of Oil, Gas and
Coal Technology, 1(1-2), 65-80. IEEE.
Node.js. (2016). Retrieved from https://nodejs.org/en/. Last Accessed on May 13, 2016.
Nobakht, M., Mattar, L., Moghadam, S., & Anderson, D. M. (2012). Simplified forecasting of
tight/shale-gas production in linear flow. Journal of Canadian Petroleum Technology,
51(06), 476-486.
PetroFeed Inc. (2015). Retrieved from https://www.petrofeed.com/maps. Last Accessed on May
13, 2016.
PostgreSQL Global Development Group. (2016). Retrieved from https://www.postgresql.org/.
Last Accessed on May 13, 2016.
Page 87
77
Serapiao, A., Tavares, R. M., Mendes, J. R. P., & Guilherme, I. R. (2006). Classification of
petroleum well drilling operations using Support Vector Machine (SVM). In Proceedings
of 2006 International Conference on Computational Intelligence for Modelling, Control
and Automation and International Conference on Intelligent Agents, Web Technologies
and Internet Commerce, 145-145. IEEE.
Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information
visualizations. In Proceedings of 1996 IEEE Symposium on Visual Languages, 336-343.
IEEE.
Siirtola, P., Koskimäki, H., Huikari, V., Laurinen, P., & Röning, J. (2011). Improving the
classification accuracy of streaming data using sax similarity features. Pattern Recognition
Letters, 32(13), 1659-1668.
Spence, R. (2007). Information Visualization: Design for Interaction. Prentice-Hall, Inc., Upper
Saddle River, NJ, USA, 2nd edition.
Sproule. (2015). Type curve analysis for landmen. Retrieved from http://landman.ca/wp/wp-
content/uploads/2014/07/Sept-18-2015-Sproule.pdf. Last Accessed on May 13, 2016.
Strecker, U., & Uden, R. (2002). Data mining of 3D poststack seismic attribute volumes using
Kohonen self-organizing maps. The Leading Edge, 21(10), 1032-1037.
Ward, M. O., Grinstein, G., & Keim, D. (2010). Interactive data visualization: foundations,
techniques, and applications. CRC Press.
Zhong, M., Schuetter, J., Mishra, S., & Lafollette, R. F. (2015). Do Data Mining Methods Matter?:
A Wolfcamp Shale Case Study. In SPE Hydraulic Fracturing Technology Conference.
Society of Petroleum Engineers.
Page 88
78
Zoumboulakis, M., & Roussos, G. (2011). Complex event detection in extremely resource-
constrained wireless sensor networks. Mobile Networks and Applications, 16(2), 194-213.