IMPACT OF A DATA WAREHOUSE MODEL FOR IMPROVED DECISION-MAKING
PROCESS IN HEALTHCARE
Pubudika Kumari Mawilmada BBus (IT Management), MIT
Submitted in fulfilment of the requirements for the degree of
Master of Information Technology (Research)
Computer Science Discipline
Faculty of Science and Technology
Queensland University of Technology
October 2011
i
impact of a data warehouse model for improved decision-making process in healthcare i
Keywords
Cardiology, Clinical Decision Support Systems, Data marts, Data warehouse, Decision-making, Information systems, Healthcare, Star schema, Snow flakes schema.
ii
ii impact of a data warehouse model for improved decision-making process in healthcare
Abstract
The health system is one sector dealing with a deluge of complex data. Many
healthcare organisations struggle to utilise these volumes of health data effectively
and efficiently. Also, there are many healthcare organisations, which still have stand-
alone systems, not integrated for management of information and decision-making.
This shows, there is a need for an effective system to capture, collate and distribute
this health data. Therefore, implementing the data warehouse concept in healthcare is
potentially one of the solutions to integrate health data. Data warehousing has been
used to support business intelligence and decision-making in many other sectors such
as the engineering, defence and retail sectors.
The research problem that is going to be addressed is, “how can data
warehousing assist the decision-making process in healthcare”. To address this
problem the researcher has narrowed an investigation focusing on a cardiac surgery
unit. This research used the cardiac surgery unit at the Prince Charles Hospital
(TPCH) as the case study. The cardiac surgery unit at TPCH uses a stand-alone
database of patient clinical data, which supports clinical audit, service management
and research functions. However, much of the time, the interaction between the
cardiac surgery unit information system with other units is minimal. There is a
limited and basic two-way interaction with other clinical and administrative
databases at TPCH which support decision-making processes. The aims of this
research are to investigate what decision-making issues are faced by the healthcare
professionals with the current information systems and how decision-making might
be improved within this healthcare setting by implementing an aligned data
warehouse model or models. As a part of the research the researcher will propose and
develop a suitable data warehouse prototype based on the cardiac surgery unit needs
and integrating the Intensive Care Unit database, Clinical Costing unit database
(Transition II) and Quality and Safety unit database [electronic discharge summary
(e-DS)]. The goal is to improve the current decision-making processes. The main
objectives of this research are to improve access to integrated clinical and financial
data, providing potentially better information for decision-making for both improved
iii
impact of a data warehouse model for improved decision-making process in healthcare iii
management and patient care and also, providing greater efficiency in supporting
current similar processes.
The methodology used to conduct this research consisted of five stages. The
first stage reviewed the literature to define the background knowledge about data
warehousing, identify different data warehouse models, factors leading to model
selection and application of the data warehouse concept in the healthcare
environment. In the second stage of the methodology, a survey was conducted to
gather information on the current data repositories, current decision-making process,
current decision-making issues and data warehouse prototype development
requirements. The main survey methods used were questionnaire and unstructured
interviews. A total of ten questionnaires were distributed to stakeholders in the
cardiac surgical decision-making processes. The questionnaire consisted of twelve
questions producing data for four categories of inquiry namely: current data
repositories, decision-making process, current issues, data storage and analysis needs.
An 80% response rate was achieved (8 out of 10). Although 30% (3 of 10) did not
wish to participate further 70% (7 of 10) contributed to subsequent unstructured
interviews used to clarify and extend survey results. These were analysed
thematically and a number of decision-making knowledge gaps ascertained. The
survey and literature review data were then integrated to select a model. Thirdly, the
model prototype was developed and fourthly the integrated data was analysed and
information products created. Finally, the information products were reviewed by the
hospital staff and feedback obtained to evaluate the warehouse prototype utility.
According to the survey conducted in this research it is apparent that end users
(clinicians, unit manager, data managers from cardiac surgery, ICU, quality and
safety and clinical costing units) have limited access to data repositories other than
their own database. For instance, most of the time clinicians or unit managers have to
contact data custodians to extract and collate the information from other data
repositories. Also, then they have to manually integrate data prior to analysis and
reporting. This leads to limitations in the interaction between ICU, cardiac surgery
(CARPIA), quality and safety (e-DS), and clinical costing units databases.
All these issues create inefficiencies in the decision-making process. After
analysis of further data from the questionnaire, the user requirements were
summarised for the data warehouse prototype development. Using analysed results
iv
iv impact of a data warehouse model for improved decision-making process in healthcare
from the questionnaire and by referring to the literature, the results indicate a
centralised data warehouse model for the cardiac surgery unit at this stage. A
centralised data warehouse model addresses current needs and can also be upgraded
to an enterprise wide warehouse model or federated data warehouse model as
discussed in the many consulted publications. The data warehouse prototype was able
to be developed using SAS enterprise data integration studio 4.2 and the data was
analysed using SAS enterprise edition 4.3. In the final stage, the data warehouse
prototype was evaluated by collecting feedback from the end users. This was
achieved by using output created from the data warehouse prototype as examples of
the data desired and possible in a data warehouse environment. According to the
feedback collected from the end users, implementation of a data warehouse was seen
to be a useful tool to inform management options, provide a more complete
representation of factors related to a decision scenario and potentially reduce
information product development time.
However, there are many constraints exist in this research. For example the
technical issues such as data incompatibilities, integration of the cardiac surgery
database and e-DS database servers and also, Queensland Health information
restrictions (Queensland Health information related policies, patient data
confidentiality and ethics requirements), limited availability of support from IT
technical staff and time restrictions. These factors have influenced the process for the
warehouse model development, necessitating an incremental approach. This
highlights the presence of many practical barriers to data warehousing and integration
at the clinical service level. Limitations included the use of a small convenience
sample of survey respondents, and a single site case report study design.
As mentioned previously, the proposed data warehouse is a prototype and was
developed using only four database repositories. Despite this constraint, the research
demonstrates that by implementing a data warehouse at the service level, decision-
making is supported and data quality issues related to access and availability can be
reduced, providing many benefits. Output reports produced from the data warehouse
prototype demonstrated usefulness for the improvement of decision-making in the
management of clinical services, and quality and safety monitoring for better clinical
care. However, in the future, the centralised model selected can be upgraded to an
enterprise wide architecture by integrating with additional hospital units’ databases.
v
impact of a data warehouse model for improved decision-making process in healthcare v
Table of Contents
Keywords .................................................................................................................................................i
Abstract .................................................................................................................................................. ii
Table of Contents .................................................................................................................................... v
List of Figures ...................................................................................................................................... vii
List of Tables....................................................................................................................................... viii
List of Abbreviations .............................................................................................................................. ix
Statement of Original Authorship ............................................................................................................ x
Acknowledgments .................................................................................................................................. xi
Dedication ............................................................................................................................................ xii
CHAPTER 1: INTRODUCTION ........................................................................................................ 1
1.1 Research background ................................................................................................................... 1
1.2 Problem ........................................................................................................................................ 2
1.3 Research questions ....................................................................................................................... 3
1.4 Significance, Scope and Definitions ............................................................................................ 4
1.5 Thesis outline ............................................................................................................................... 5
CHAPTER 2: LITERATURE REVIEW ............................................................................................ 7
2.1 Review methodology .................................................................................................................... 7 2.1.1 Literature search sources ................................................................................................... 7 2.1.2 Information search strategies ............................................................................................ 9
2.2 Background theory ....................................................................................................................... 9 2.2.1 The data warehouse concept ............................................................................................ 9 2.2.2 Main components of the data warehouse ........................................................................ 10 2.2.3 Data warehouse modelling .............................................................................................. 12 2.2.4 Data warehouse methodologies ...................................................................................... 14 2.2.5 Data warehouse lifecycle ................................................................................................ 16 2.2.6 Operational systems vs data warehouses......................................................................... 17 2.2.7 Data marts ....................................................................................................................... 18
2.3 Different types of data warehouse models ................................................................................. 19 2.3.1 Centralised data warehouse ............................................................................................. 19 2.3.2 Independent data marts ................................................................................................... 19 2.3.3 Federated architecture .................................................................................................... 19 2.3.4 Hub and spoke architecture ............................................................................................. 20 2.3.5 Data mart bus architecture .............................................................................................. 20
2.4 Data warehouse architecture/model selection factors ................................................................. 21
2.5 Health information management ................................................................................................ 24 2.5.1 Healthcare decision-making ............................................................................................ 24 2.5.2 Healthcare information systems and decision-making .................................................... 25
2.6 Data warehousing and healthcare ............................................................................................... 29 2.6.1 Data warehouse implementation examples ..................................................................... 30 2.6.2 Data waehouse implementation challenges ..................................................................... 34
2.7 Summary and implications ......................................................................................................... 36
vi
vi impact of a data warehouse model for improved decision-making process in healthcare
CHAPTER 3: RESEARCH DESIGN ............................................................................................... 39
3.1 Methodology and Research Design............................................................................................ 39 3.1.1 Methodology ................................................................................................................... 39 3.1.2 Research Design ............................................................................................................. 42
3.2 Participants ................................................................................................................................ 43
3.3 Instruments ................................................................................................................................. 43
3.4 Procedure and Timeline ............................................................................................................. 44
3.5 Analysis ..................................................................................................................................... 45
3.6 Ethics and Limitations ............................................................................................................... 45
3.7 Interlectual Property Rights ....................................................................................................... 46
3.8 Health and safety ........................................................................................................................ 46
CHAPTER 4: RESULTS ANALYSIS .............................................................................................. 47
4.1 Current decision-making process ............................................................................................... 47
4.2 Decision-making issues .............................................................................................................. 48
4.3 Application development requirements analysis ....................................................................... 51
CHAPTER 5: DATA WAREHOUSE PROTOTYPE DEVELOPMENT ..................................... 55
5.1 Business intelligence tools ......................................................................................................... 55 5.1.1. SAS/Warehouse Administrator 4.3 ................................................................................. 56 5.1.2 SAS data integration studio ............................................................................................ 57
5.2 Data analysis tools ..................................................................................................................... 58 5.2.2 SAS enterprise guide ...................................................................................................... 58
5.3 Cardiac surgery data warehouse prototype selection and development ..................................... 58 5.3.1 Model selection Rationale .............................................................................................. 58 5.3.2 Development process ...................................................................................................... 63
5.4 Data analysis using the data warehouse prototype ..................................................................... 68
5.5 Data warehouse prototype evaluation ........................................................................................ 74
CHAPTER 6: DISCUSSION ............................................................................................................. 77
6.1 Limitations of the study ............................................................................................................. 81
CHAPTER 7: CONCLUSION .......................................................................................................... 85
7.1 Recommendations and future directions .................................................................................... 86
BIBLIOGRAPHY ............................................................................................................................... 87
APPENDICES ....................................................................................................................................93 Appendix A: Questionnaire ................................................................................................................... 93 Appendix B: Design of data warehouse fact and dimension tables ..................................................... 100
vii
impact of a data warehouse model for improved decision-making process in healthcare vii
List of Figures
Figure 1: Components of the data warehouse ........................................................................................ 11
Figure 2: Multidimensional data ........................................................................................................... 12
Figure 3: Star schema data model ........................................................................................................ 13
Figure 4: Snowflakes schema data model .............................................................................................. 14
Figure5: Data warehouse system life cycle............................................................................................ 16
Figure 6: Data warehouse architectural types ....................................................................................... 20
Figure 7: Different types of data warehouse architectures .................................................................... 21
Figure 8: Results of the survey ............................................................................................................. 22
Figure 9: The distribution of the architectures ..................................................................................... 22
Figure 10: Research model for data warehouse architecture selection ................................................. 23
Figure 11: An integrated model for data warehouse architecture selection .......................................... 23
Figure 12: Decision-making levels within an organisation ................................................................... 25
Figure 13: Timelining Health Information Systems Evaluation ........................................................... 27
Figure 14: Advantages and disadvantages of data integration architectures ........................................ 28
Figure 15: Current use of BI\CI by healthcare organisations ................................................................ 30
Figure 16: Top 3 barriers to the use of business/clinical intelligence applications ............................... 35
Figure 17: Top 3 IT challenges to implementing/deploying business intelligence applications ........... 36
Figure 18: Current support from the IS’s for decision making .............................................................. 48
Figure 19: Decision-making issues with current IS’s ............................................................................ 50
Figure 20: Data quality issues in current decision-making process ....................................................... 51
Figure 21: Security and privacy concerns for DW prototype development ........................................... 53
Figure 22: VHA corporate data warehouse visual architecture ............................................................. 59
Figure 23: Medical federated data warehouse model ........................................................................... 60
Figure 24: CDW architecture for traditional Chinese medicine ........................................................... 61
Figure 25: Proposed data warehouse model for the TPCH Cardiac surgery unit .................................. 63
Figure 26: Risk score star schema ......................................................................................................... 66
Figure 27: Cost star schema .................................................................................................................. 67
Figure 28: Cardiac Surgery unit data warehouse model ........................................................................ 68
Figure 29: Comparison of risk scores –group by PREDMORT ............................................................ 69
Figure 30: Interaction of risk scores ...................................................................................................... 69
Figure 31: The actual expenditure per episode of care according to the certain clinical group ............. 70
Figure 32: Cost of reoperation for bleeding as an example of post operational complications ............ 71
Figure 33: Costs associated with the DRG’s- according to cardiac surgery unit admission status ....... 72
viii
viii impact of a data warehouse model for improved decision-making process in healthcare
List of Tables
Table 1: Literature search sources ........................................................................................................... 8
Table 2: Comparison of data warehouse with OLTP systems .............................................................. 17
Table 3: Differences between data mart and data warehouse ................................................................ 18
Table 4: Combined reasons for data warehouse failure ......................................................................... 34
Table 5: Methodology stages ................................................................................................................ 39
Table 6: Decisions/ Problems would like to address by end users ........................................................ 52
Table 7: Dimension Tables ................................................................................................................... 65
ix
impact of a data warehouse model for improved decision-making process in healthcare ix
List of Abbreviations
BI Business Intelligence
CDSS Clinical Decision Support Systems
CI Clinical Intelligence
CIO Cheif Information Officer
CMS Center for Medicare and Medicaid services
DM Data Marts
DW Data Warehouse
e-DS Electronic Discharge Summary
FED Federated Data warehouse
HBCIS Hospital Based Corporate Information System
ICU Intensive Care Unit
IDM Independent Data Marts
IS Information Systems
IT Information Technology
ITI Information Technology Infrastructure
OIPT Organizational Information Processing
Theories
OLAP Online Analytical Processing
OLTP Online Transaction Processing
RCT Randomised Controlled Trials
TPCH The Prince Charles Hospital
VHA Veteran’s Health Administration
x
x impact of a data warehouse model for improved decision-making process in healthcare
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the
best of my knowledge and belief, the thesis contains no material previously published
or written by another person except where due reference is made.
Signature: _________________________
Date: _________________________
xi
impact of a data warehouse model for improved decision-making process in healthcare xi
Acknowledgments
I would like to thank my principal supervisor Dr. Tony Sahama for guidance
and support given to me in conduct this research. Also, I would like to thank my
associate supervisor Craig Huxley for his guidance and advice. I would like to
acknowledge my associate supervisor Susan Smith for advice and support given
throughout this project. I appreciated her assistance and guidance provided to me and
being patient in answering my questions. To the Prince Charles Hospital cardiac
surgery unit data managers Gai Harris, Lesley Drake, Kay Watson Clinical costing
manager Allan Rowe, Senior costing officer Diana Lal, Applications infrastructure
manager Brad Day and ICU health information manager Lynette Munck; I would like
to thank you for support in many ways for my research project.
xii
xii impact of a data warehouse model for improved decision-making process in healthcare
Dedication
This thesis is dedicated to my parents
U.B. Mawilmada and N.K. Mawilmada
who have supported me all the way since the beginning of my studies.
Chapter 1: Introduction 1
Chapter 1: Introduction
This chapter outlines the background (section 1.1) and the research problem to
be addressed by the research project (section 1.2) and research questions (section1.3).
Section 1.4 describes the significance and scope of this research. Finally, section 1.5
includes an outline of the remaining chapters of the thesis.
1.1 RESEARCH BACKGROUND
“Healthcare is an information intensive business generating huge volumes of
data from hospitals, primary care surgeries, clinics and laboratories” (Grimson,
Grimson, & Hasselbring, 2000, p. 49). According to Sahama and Croll (2007), data
acquisition and distribution of information create a challenging situation for people
engaged in the medical sector. Information Technology (IT) today plays a major role
in healthcare through the introduction of systems such as electronic health records
and telemedicine for example. Integration of stand-alone systems would benefit
health organisations. However, there are many healthcare organisations which still
have stand-alone Information Systems (IS) (de Mul, Alons, van der Velde, Konings,
Bakker, & Hazelzet, 2010). Integrating stand–alone systems will become a more
complex task as stored data is increasingly used for decision-making in clinical care,
quality assurance, research and management (de Mul et al., 2010). Jani, Davis and
Fox (2007) stated that, although there are recent advances in database developments
their impact is limited because there are limited opportunities to link these databases.
Although many clinical ISs have been designed or are available, most benefit the area
of hands on care for individual patients in transactional systems rather than
supporting the analyses of data (de Mul et al., 2010; Sanders & Protti, 2008). As
stated by Albert, Walter, Arnrich, Hassanein, Rosendahl, Bauer and Ennker (2004, p.
312), “Clinicians are encouraged to improve their methods of investigation and
analysis of outcomes, which still tend to be underdeveloped in comparison to
methods available in industry”.
2
2 Chapter 1: Introduction
1.2 PROBLEM
The problem that is going to be addressed is, “how does data warehousing
assist the decision-making process in healthcare”. To address this problem narrowed
the scope of the research to an investigation focusing on a cardiac surgical unit.
Arigon (2007) describes that, data used in cardiac surgery consists of alphanumeric
data, images and signals. These may come from a number of data repositories. The
analysis environment of such data must include processing methods in order to
compute or extract the knowledge embedded in the raw data (Arigon et al., 2007). A
data warehouse is a potential solution which may provide a better environment for
the analysis purposes of these data.
All clinical care units are accountable for providing quality of care. There have
been many models of quality measures developed (de Mul et al., 2010). However,
sometimes these require complex queries to analyse data and it is a time consuming
process. Moreover, as stated by Albert et al (2004), often predictive models cannot
consider all patients characteristics, and do not include non patient related factors.
Therefore, there is a need for a system to analyse cardiac data from different
perspectives. However, most of the time cardiac information systems such as those
for cardiac surgery have minimal interaction with other units. By combining the
cardiac surgery unit data repository with clinical units such as the Intensive Care unit
(ICU), anaesthesia and financial units, clinicians could gain more benefits. The
implementation of a data warehouse concept is one potential solution to efficiently
facilitate easy analysis of data (de Mul et al., 2010).
Finally, although most clinicians believe that the use of the data warehouse
concept in cardiac surgery unit can lead to efficient decision-making, high quality of
patient care and safer processes, only a small proportion of this technology has been
adopted (de Mul et al., 2010).
This research has used the cardiac surgery unit at the Prince Charles Hospital
(TPCH) as a case study. The cardiac surgery program at the Prince Charles Hospital
uses a stand-alone database of patient clinical data, which supports Clinical Audit,
Service Management and Research functions. There is a limited two way interaction
with other clinical and administrative databases at TPCH to support these decision-
making processes. This research aims to propose a suitable data warehouse model for
the cardiac surgery unit at TPCH, in order to improve the decision-making process.
3
Chapter 1: Introduction 3
The main databases employed to develop a data warehouse prototype are the cardiac
surgery register database (CARPIA), the ICU database, a quality and safety unit
database and the enterprise clinical costing unit database. The cardiac surgery register
database stores cardiac surgical patients’ demographics data, patients history,
preoperative data, procedural (surgical) data, post-operative outcomes data, test
results, diagnosis, risk scores and so on. The data for this database is derived from
several sources; however most data are collected and entered manually into the
system by trained clinical data managers. Some basic patients’ information is derived
from the Hospital Based Corporate Information System (HBCIS) which is the enterprise
hospital patient administration system. Also, the pathology system and main theatre
information system provide information for the CARPIA database. The Quality and
Safety unit database of interest is known as the electronic Discharge Summary
database (e-DS). This database contains hospital wide discharge summaries of all
patients; It is a small transactional database deriving information from HBCIS and
clinician entry.
The Clinical Costing unit already employs the State level enterprise data
warehouse known as Transition II. The main data sources for this database are
HBCIS, and the other management feeder systems such as Emergency Department
Information System (EDIS), Operating Room Management Information System
(ORMIS), Enterprise pathology results information system (Auslab) and Trendcare
system (patient-nurse dependency). The Transition II database manages data in three
levels: the financial level, departmental level and the patient level, although little
actual clinical data are captured. The ICU database contains data of patients admitted
to the ICU. Manually entered data are the main source of information for this
database and include patient clinical data such as morbidity scores, risk scores
procedural data and physiological measurements.
1.3 RESEARCH QUESTIONS
One of the main aims of this research is to develop background knowledge of
data warehousing and its application to healthcare. Data warehousing plays a major
role in businesses today in contributing to improved decision-making. As in other
businesses, the data warehouse concept is also becoming popular in the healthcare
industry as making appropriate well informed decisions is the basis of effective
4
4 Chapter 1: Introduction
healthcare, which will lead to improvements in the quality of service and reduce the
costs in healthcare. However, there are still many healthcare organisations which
have disparate information systems that are not integrated and do not support
improved decision-making processes. Therefore, it is important to identify those
issues with the current information systems relating to the impediment of better
decision-making and to the potential. Hence, the first question asked would be:
“What decision-making issues exist or are faced by healthcare professionals with the
current information systems?”
There are different alternatives of data warehouse architecture available which
support various decision-making structures and purposes. Therefore, it is important
to consider selection of a suitable data warehouse model, which will facilitate quality
decisions in the Cardiac Surgical context. This will be the key to the next question:
“How might decision-making be improved within healthcare services by
implementing a more aligned data warehousing model or models?”
This research will, develop a suitable data warehouse model for the Cardiac
surgery unit at The Prince Charles Hospital, in order to improve decision-making
processes.
1.4 SIGNIFICANCE, SCOPE AND DEFINITIONS
This research presents four different outcomes. As discussed above, a data
warehouse prototype will be developed for the Cardiac surgery unit at the Prince
Charles Hospital. This will:
• improve access to administrative, financial and clinical information.
• potentially improve decision-making for the management of the clinical
services.
• potentially improve quality and safety monitoring to assist healthcare accountability
and better clinical care .
• provide data for clinical effectiveness and evaluation research.
5
Chapter 1: Introduction 5
1.5 THESIS OUTLINE
Chapter 1 provides details about the research background, research problem, its
purpose and outcomes of the research. Four outcomes are highlighted as part of the
completion of this project
Chapter 2 presents the review of literature on data warehousing, including
different data warehouse architectural types, how data warehouse is different from
operational systems and data marts, data warehouse modelling, and data warehouse
model selection factors. Furthermore, this chapter provides details on healthcare
information management, healthcare decision-making issues and application of the data
warehouse concept in healthcare with some examples.
Chapter 3 describes the research design of this research project. It covers research
methodology, research design, participants and instruments used in the research. The
research methodology consists of five stages and each stage is explained in detail.
Chapter 4 presents the analysis of the survey findings. It covers the current
decision-making process, issues related to the current decision-making process and also
identifies the user requirements for data warehouse prototype development.
Chapter 5 presents the cardiac surgery data warehouse prototype development.
Firstly, it briefly describes the business intelligence tools used to develop the data
warehouse prototype and the benefits of those tools. The next section, explains the data
warehouse development steps using the SAS data integration studio 4.2 software.
Chapter 6 provides a discussion of survey results analysis. This chapter contains a
full discussion and evaluation of the results with reference to the literature and the
limitations.
Chapter 7 concludes the thesis by providing information on the research process,
its benefits to TPCH cardiac surgical unit, constraints and limitations faced during the
project and recommendations and future directions.
Chapter 2: Literature Review 7
Chapter 2: Literature Review
This chapter reviews the literature on the following topics: The first section
gives a brief introduction of review methodology (2.1). This covers the literature
search sources (2.1.1) and information search strategies (2.1.2). The second section
(2.2) discusses the background theory of data warehousing in general and gives
detailed information about data warehouse components, data warehouse modelling,
data marts and how data warehouses differ from operational systems. The third
section (2.3) discusses different types of data warehouse models and selecting factors
and the issues related to data warehouse selection. The next section (2.4) identified
the data warehouse model selection factors. Fifth section (2.5) reviews the literature
on health information management. This will covers decision-making and issues
related to healthcare and healthcare information systems. The following section (2.6)
discusses the data warehouse concept in healthcare and some of the real examples of
data warehouse implementation and its benefits. The section 2.7, studies the
implications from the literature and develops a framework for the research.
2.1 REVIEW METHODOLOGY
2.1.1 LITERATURE SEARCH SOURCES
Many information sources were used to search the literature widely. The
primary literature search sources used were publisher databases. The publisher
databases provide information from many formal sources such as journal articles,
research papers, and conference papers. They also provide a major source of
traditional academic information. Most of the information sources from the
Queensland University of Technology (QUT) library are stored as books, magazines
and e-books. Moreover, the general web search engines such as Google Scholar,
Google, Scirus and Inforpeople provide important e-books, peer reviewed articles as
well as non-peer reviewed industry and ‘grey’ literature that is related to the research
field. The following table shows the information sources that were used.
8
8 Chapter 2: Literature Review
Search
material
Source type Main information source
Journal articles
and conference
papers
Databases ScienceDirect
Web of Science
ACM portal
SpringerLink
ProQuest
IEEE Xplore
CiteSeerX
EBSCO host
Elsevier
JAMA
Books Libraries Queensland University of Technology
Online providers Google books
Web sites
Case studies
Australian
Digital Thesis
Web search engines
www.google.com
http://scholar.google.com.
www.scirus.com
www.infopeople.org
http://au.search.yahoo.com
http://www.bing.com
http://au.altavista.com
http://www.webwombat.com.
http://www.dwinfocenter.org/getstart.html
Web groups Web search engines http://www.technologyreview.com/blog/
http://blog.kalido.com/
http://tdwi.org/
http://www.information-management.com/
http://www.sas.com/
http://www.bi-dw.info/
http://www.dwaa.org.au/layout-8.html
Table 1: Literature search sources
Chapter 2: Literature Review 9
2.1.2 INFORMATION SEARCH STRATEGIES
Many strategies were used to search widely for information related to the
research topic and research questions. The search terms “data warehouse”, “data
warehousing”, “data integration” were used to find the basic articles about the data
warehouses. These searches returned a number of articles. And the next step to
combine the initial terms with other terms such as “healthcare”, “decision-making”,
“models” etc. to narrow down the search. Search strategies included the use of
boolean operators, use of proximity operators such as 1W/nn (ScienceDirect,
ProQuest) and Near operator (N) in EBSCO Host etc. which helped to narrow down
the search results. Abstracts were reviewed and if certain criteria (e.g. related to the
research questions) were identified in the abstract then the full paper was included in
the literature review. Citation indexes were used to search for related publications.
Also, citation indexes helped to identify the latest research trends and helped to
obtain the broadest approach to addressing the research topic. Moreover, the citation
indexes were useful in gathering information about authors, journal articles and
specialised areas of publications.
2.2 BACKGROUND THEORY
2.2.1 THE DATA WAREHOUSE CONCEPT
Data warehousing technology aims to structure the data in a appropriate way
to access the data, and use it in an efficient and effective manner (Dias, Tait,
Menolli, & Pacheco, 2008). As stated by Kerkri, Quantin, Allaert, Cottin, Charve,
Jouanot and Yétongnon (2001), the data warehouse is responsible for the
consistency of information. The integration of tools such as query tools, reporting
tools and analysis tools provide opportunity to handle the coherence of information.
The aim of data warehousing is to organise the gathering of a wide range of data and
store it in a single repository (Kerkri et al., 2001). Currently, data warehousing plays
a major role in the business community at large. It is also relevant to healthcare as
mentioned in del Hoyo-Barbolla and Lees (2002, p. 43), “in a competitive climate, if
healthcare organisations are to keep their customers, knowing and managing
information about them is essential and organisations realized that it is crucial to
access viable and timely data.” Furthermore, integrating data from the different
1W/nn - W represents "within", and nn represents the maximum number of words between the terms.
10
10 Chapter 2: Literature Review
sources and converting them into valuable information is a way to obtain
competitive advantage (del Hoyo-Barbolla & Lees, 2002).
Data warehousing is “a collection of decision support technologies aimed at
enabling the knowledge worker (executive, manager, analyst) to make better and
faster decisions” (Chaudhuri & Dayal, 1997, p. 1). According to Inmon (2005, p.
29) data warehouse is a “subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management decisions”. March and Hevner (2007)
argued that the three components of intelligence namely understanding, adaptability
and profiting from experience are important considerations when designing the data
warehouse. Also, these authors mentioned that the data warehouse should allow
managers to gather information such as identifying and understanding different
situations and the reasons for their occurrence. Further, they have argued that the,
data warehouse should “enable a manager to locate and apply the relevant
organizational knowledge and to predict and measure the impact of decision over
time” (March & Hevener, 2007, p.1035). However, as mentioned by March and
Hevner (2007), these arguments forms the challenges that need to be considered
when implementing a data warehouse.
2.2.2 MAIN COMPONENTS OF THE DATA WAREHOUSE
According to Kimball and Ross (2002), a few components can be identified to
form the data warehouse environment (Figure 1). Each component of the data
warehouse provides a specific function. The main components are,
• Operational source system
• Data Staging Area
• Data Presentation Area
• Data Access Tools
Operational Source Systems
The Operational source system is mainly concerned about processing
performance and availability. Generally, the source system maintains a small amount
of historical data. The queries designed against source systems are narrow. On the
other hand, one-record-at-a-time queries which operate as part of the normal
transaction flow and act according to the demands on the operational system
(Kimball & Ross, 2002).
Chapter 2: Literature Review 11
Figure 1: Components of the data warehouse (Kimball & Ross, 2002, p. 7)
Data Staging Area
The data staging area is the place that keeps the data as temporary storage
(Kimball & Ross, 2002). Also, this area is known as the Extract Transformation Load
(ETL) because it is conducting the data extraction, transformation and loading. In
other words, the data staging area can be referred to as everything between the
operational source systems and the data presentation area (Kimball & Ross, 2002).
The first process of transferring data to the data warehouse is extraction. During this
process it is important to read and understand the source data and copy them to the
staging area of the data warehouse for further management. After extracting the data
to the staging area many alterations such as cleansing the data (correcting
misspellings, resolving domain conflicts, dealing with missing elements, or parsing
into standard formats), combining data from multiple sources, deduplicating data,
and assigning warehouse keys take place (Kimball & Ross, 2002). Then the load the
data to the presentation area of the data warehouse (Kimball & Ross, 2002).
Data Presentation
The data presentation area is the place where data is organized, stored, and
made available to the users. In addition, the data presentation area is the place where
business communities see data and gain access using data access tools. As stated by
Kimball and Ross (2002), this area can be referred as series of integrated data marts.
A each of this data mart presents the data from a single business process (Kimball &
Ross, 2002).
12
12 Chapter 2: Literature Review
Data access Tools
The data access tools element is the final element of the data warehouse. This
element provides many capabilities for the business users to control the presentation
area for analytic decision-making. Generally, the data access tool can act as a simple
query tool or can be complex as a data mining application (Kimball & Ross, 2002).
2.2.3 DATA WAREHOUSE MODELLING
Generally Data warehouse modelling is used to,
• Identify the data warehouse, data mart, and decision support system data and
information requirements
• Represent the data warehouse view
• Design the data warehouse schema according to the information
requirements. (Borysowich, 2007)
In the data warehouse, after the business queries and subject area have been
identified the information stored in the data warehouse/data mart is designed
(Borysowich, 2007). Designing the data warehouse/data mart structure is different
from designing the operational systems. According to Mohania, Samtani, Roddick
and Kambayashi (1999), operational systems consist of simple pre-defined queries.
On the other hand, in data warehousing environments queries join with more tables
and more computation time and informality (Mohania et al., 1999). This leads to an
emergence of a new view of data modeling design. As a result of this, the multi-
dimensional or data cube has become the suitable data model for the data
warehousing environment. As stated by Chaudhri and Dayal (1997), a
multidimensional view of the data is important when designing front end tools,
database design and query engines for online analytical processing (OLAP).
Figure 2: Multidimensional data (Chaudhuri & Dayal, 1997, p. 4)
Chapter 2: Literature Review 13
As stated by Ramakrishnan and Gehrke (as cited in Tan, 2006, p.876) “ Online
analytical processing (OLAP) is a term that describes a technology that uses a
multidimensional view of aggregate data to provide quick access to strategic
information for the purposes of advanced analysis”. Generally, OLAP supports
queries and data analysis by collecting, managing and processing multidimensional
data (Tan, 2006). In multidimensional data modeling, data is stored as facts and
dimensions. Facts can be numerical or factual data and can represent the activity
which is specific to the business. On the other hand, “a dimension represents a single
perspective of the data” (Mohania et al., 1999, p. 44) and attributes of the dimension
characterises each dimension. For instance a customer dimension can consist of the
name of the customer, address, and the city. Figure 2 shows the multidimensional
data view. Two modeling techniques named star schema or snowflakes schema are
used to represent multidimensional data.
Star schema
The star schema modelling consists of a central table (fact table) and other
tables which directly link to it. These tables are known as dimension tables.
According to Chaudhuri and Dayal (1997), star schema is used in most data
warehouses to represent the multidimensional data model.
Figure 3: Star schema data model (ExecutionMih, 2010, p. 2)
In general, the fact table contains the keys and measurements. For example
when referring to the Figure 3 sales fact table, it can be seen to contain keys such as
time_key, Item_key, branch_key and location _key and measures such as units_sold,
14
14 Chapter 2: Literature Review
dollars_sold and avg_sales. In addition, the dimension tables are related to the sales
fact table by time, branch item and location fields. Each of these dimension tables
contains the attributes related to each dimension (ExecutionMih, 2010).
Snowflakes schema
The snowflakes schema is a more complex data warehouse model than the star
schema. Like the star schema the snowflakes schema also consists of fact tables and
dimension tables. However, the snowflakes schema dimension tables are normalised
and linked to another dimension table (Chaudhuri & Dayal, 1997).
Figure 4: Snowflakes schema data model (ExecutionMih, 2010, p. 2)
2.2.4 DATA WAREHOUSE METHODOLOGIES
There are main two basic methodological approaches for data warehouse
design. These are the top- down approach and the bottom-up approach (Golfarelli &
Rizzi, 2009). In the top-down approach, user requirements are to analyse, plan and
design it, and implement it as a whole. But, this approach has many problems such as
high costs, difficulty of the analysing and collecting of all sources, difficulty of
collecting all specific needs of all the organisational departments and more
development time. In the bottom-up approach the data warehouse is built and then
several data marts will be created. This method takes a partial picture of the whole
application, therefore, there is a risk involved with this method (Golfarelli & Rizzi,
2009). The bottom-up approach is the accepted method of most users. Moreover, List
Chapter 2: Literature Review 15
et al. (2002), have identified three data warehouse methodologies such as Data-
Driven Methodologies, Goal Driven Methodologies, User Driven Methodologies.
Data – Driven methodologies
As stated by List, Bruckner, Machaczek and Schiefer (2002), “Bill Inmon, the
founder of data warehousing argues that data warehouse environments are data
driven, in comparison to classical systems, which have a requirement driven
development life cycle”. Also, as mentioned by Inmon (as cited in List et al, 2002),
user requirements are need to consider finally on the decision support system life
cycle.
Goal driven methodologies
List et al (2002), discussed about the Semantic Object Model (SOM) process
modelling technique that presented by Böhnlein and Ulbrich-vom Ende. In the first
stage of the technique, identifies the company goals and services. Then the SOM
schema applying to analysed the business processes. This helps to track the
company’s customers and their business transactions, and then at the next stage these
transactions are transformed into the existing dependencies called information
systems. The final step, identifies the measures and dimensions (according to
transactions and dependencies) (List et al., 2002).
User driven methodologies
According to Westerman (as cited in List et al 2002), the user driven
methodology is a Wal-mart approach. This approach mainly focuses on
implementing a business strategy. “The methodology assumes that the company goal
is the same for everyone and the entire company will therefore be pursuing the same
direction” (List et al., 2002, p. 205). The first prototype is developed according to the
business needs. Firstly, business people set goals and then identify and prioritise the
business questions that support the business goals. Then the most important
questions are classified with the data elements.
Moreover, there are many development methodologies are introduced by
different authors and organisations. As stated by Golfarelli and Rizzi (2009) (as cited
16
16 Chapter 2: Literature Review
in Kimball et al, 1998), business dimensional life cycle used to design, develop and
implement data warehouse systems. The rapid warehousing methodology is another
approach to managing the data warehousing projects. This approach was introduced
by the SAS institute, who is leader in the statistical analysis industry. The rapid
warehousing methodology consists of seven phases: Assessment, requirements,
design, construction and final test, deployment, maintenance and administration and
review (Golfarelli & Rizzi, 2009).
2.2.5 DATA WAREHOUSE LIFECYCLE
The data warehouse life cycle plays a major role when developing a data
warehouse. The following figure shows the basic phases of the data warehouse life
cycle. This life cycle takes the bottom up approach (Figure 5). The main phases of
this life cycle are setting goals and planning, designing infrastructures and designing
and developing data marts (Golfarelli & Rizzi, 2009). The first phase involves
feasibility study. In this phase many activities take place such as setting system goals
and estimating the costs for building the data warehouse. The next phase, analyses
and compares the architecture solutions for the data warehouse design (Golfarelli &
Rizzi, 2009). Moreover, the designer must consider the available tools and
technologies for design the plan. The final step involves designing and developing
the data marts. In this phase, new data marts are created and added to the data
warehouse system (Golfarelli & Rizzi, 2009).
Figure5: Data warehouse system life cycle (Golfarelli & Rizzi, 2009, p. 46)
Setting goals and planning
Designing infrastructures
Designing and developing data marts
Chapter 2: Literature Review 17
2.2.6 OPERATIONAL SYSTEMS VS DATA WAREHOUSES
There are the many differences between operational systems and the data
warehouse. The primary difference between operational systems and data
warehousing systems is that operational systems are designed to support transaction
processing (OLTP) and data warehousing systems are designed to support online
analytical processing (OLAP). The users of the operational systems deal with one
record at a time. Also, they perform the same operational task repetitively. On the
other hand, a data warehouse is capable of handling with volumes of data at a time
and helps to make decisions in a timely and consistent manner with accurate and up
to date information (Kimball, 2002).
The follow table shows the differences between the on line transaction
processing system (OLTP) and a data warehouse.
Table 2: Comparison of data warehouse with OLTP systems (Kadlec, 2005)
18
18 Chapter 2: Literature Review
According to Inmon (2005), there are many challenges that exist in the use of
current information systems. These include a lack in data credibility, issues with
productivity and inability to transform data into information. The lack in credibility
occurs due to many reasons such as time discrepancy, algorithmic differences, level
of data extraction, problems with external data and no common source of data from
the beginning (Inmon, 2005). This leads to many incompatibilities in the reports
generated by the different departments of an organisation. On the other hand,
productivity becomes a major issue when an organisation needs to analyse the same
data across all its departments (Inmon, 2005). This is because, many programs must
be written and there are many technological barriers to overcome (Inmon, 2005).
2.2.7 DATA MARTS
A data mart and a data warehouse have different architectural structures. On
some occasions there is a need to perform a standardized data analysis and organising
data to identify simple usage patterns. As a result of this, data warehousing is
arranged in to small units called data marts (Bonifati, Cattaneo, Ceri, Fuggetta, &
Paraboschi, 2001). As mentioned by Inmon (1999), “a data mart is a collection of
subject areas organised for decision support based on the needs of a given
department”. Therefore, each department has its own way of understanding how the
data mart should look. Each data mart is designed according to the department’s
needs (Inmon, 1999). The following table shows the structure and the differences
between the data marts and the data warehouse.
Data Mart Data Warehouse Departmental Corporate High level of granularity Low level of granularity
Star join structure Star join/Snowflake structure
Modest amount of historical data Robust amount of historical data Technology optimal for access and analysis Technology optimal for holding, and
managing massive volumes of data
Each department has a different structure Structure suits corporate understanding of data
Table 3: Differences between data mart and data warehouse (Inmon, 1999, p. 2)
Chapter 2: Literature Review 19
2.3 DIFFERENT TYPES OF DATA WAREHOUSE MODELS
Different types of data warehouse models can be identified. Ponniah (2010),
describes basic data warehouse architectural types available (Figure 6). She has
introduced five different data warehouse architectural designs. These are, centralised
data warehouse architecture, independent data marts (IDM), federated architecture
(FED), hub and spoke and data marts bus architecture. Also, as mentioned by
Ariyachandra and Watson (2010), these are reference architectural types which
provides guidance when creating a new design.
2.3.1 CENTRALISED DATA WAREHOUSE
The centralised data warehouse models consider enterprise level information
requirements. The warehouse contains atomic level data which is maintained in the
third normal form and sometimes, summarised data will be stored. There are no
separate data marts developed in this architecture (Ponniah, 2010).
2.3.2 INDEPENDENT DATA MARTS
The independent data marts are developed to meet the needs of individual the
organisational units (Ariyachandra & Watson, 2005). However, these data marts do
not provide a ‘single version of the truth’. As stated by Marco (2000), several
features can be identified in the independent data marts architecture. These features
include:
- The each data mart is started directly from the operational systems.
- In general, data marts are built independently from one another by autonomous
teams (Independent teams will typically deploy tools, software, hardware, and
processes).
Also, inconsistent data definitions, use of different dimensions and measures of
IDM prevent analysing the data across the data marts (Ariyachandra & Watson,
2005). Moreover, Marco (2000) identified problems such as redundant data,
redundant processing, scalability and non integration of this architecture.
2.3.3 FEDERATED ARCHITECTURE (FED)
As stated by Ariyachandra and Watson (2010, p. 13), “this architecture leaves
existing decision support structures (e.g., operational systems) in place”. The data in
the warehouse integrates logically or physically using different methods such as share
20
20 Chapter 2: Literature Review
keys, global meta data, distributed queries etc.. According to Jindal and Acharya (as
cited in Ariyachandra & Watson, 2010), this architecture is more suitable for the
firms that have pre-existing, complex decision support systems.
2.3.4 HUB AND SPOKE ARCHITECTURE
This architecture is similar to centralised architecture. It contains atomic
(detail) level data which are normalised into third normal form. There are
independent data marts attached to this centralised data warehouse. The independent
data marts acquire data from the centralised data warehouse. The centralised data
warehouse act as a hub and the independent data marts act as spokes. The
independent data marts develop for different purposes of the organisation (Ponniah,
2010).
2.3.5 DATA MART BUS ARCHITECTURE
The data mart bus architecture is designed according to the business
requirements of the organisation (Ponniah, 2010). At the beginning, data mart
architecture is designed with dimensions and measurements and later on,
measurement data marts are added to it. The data marts consist of atomic and
summarised data and are organised in star schemas (Ponniah, 2010).
Figure 6: Data warehouse architectural types (Ponniah, 2010, p. 33)
Chapter 2: Literature Review 21
Figure 7: Different types of data warehouse architectures (Sen & Sinha, 2005, p. 80)
Moreover, Sen and Sinha (2005) discussed about some other different types of
data warehouse architecture (Figure 7). Some of these data warehouse architectural
types are extended versions of the above mentioned architectural types. For example
enterprise warehouse with operational data store, hub and spoke data mart
architecture.
2.4 DATA WAREHOUSE ARCHITECTURE/MODEL SELECTION FACTORS
According to the survey done by Forrester as cited in Agosta, 2005 among 213
practitioners at the Data Warehousing Institute in the San Diego Conference in
August 2004, most respondents selected the “Hub and Spoke” data warehouse
architecture as the most suitable architecture (see figure 8).
Agosta (2005) stated that, “the survey did not ask about data modelling
philosophy, and this survey is perfectly consistent with practitioners implementing
dimensional models in different architectures - centralised, hub-and-spoke, as well as
"conformed" designs”. However, Agosta (2005) argued that there is no right or
wrong data warehousing architecture itself, because most of the architectures
(models) are successful with alternative architectures.
22
22 Chapter 2: Literature Review
Figure 8: Results of the survey (Agosta, 2005)
Another survey conducted by Ariyachandra and Watson (2005) among 454
participants, on data warehouse architecture selection among companies, showed that
39% selected the hub and spoke architecture and only a small percentage selected the
federated architecture (Figure 9).
Figure 9: The distribution of the architectures (Ariyachandra & Watson, 2005, p. 24)
According to Ariyachandra and Watson (2010, p. 1), “data warehouse selection
decision is a subset of IT infrastructure (ITI) design”. However, little research has
been conducted in ITI design and most findings are depicted from case studies or
Chapter 2: Literature Review 23
recommendations which are developed from observation or indications. As stated by
Ariyachandra and Watson (2010), most of the research does not address the factors
that influence the data warehouse design. Ariyachandra and Watson (2010), have
introduced a research model for data warehouse architecture selection. Figure 10
shows the research model they have introduced.
Figure 10: Research model for data warehouse architecture
selection (Ariyachandra & Watson, 2010, p. 4)
Their research on this model and further analysis shows that there is a
combination of several factors affecting the selection of data warehouse architecture.
They have introduced an overall model for data warehouse selection. The model has
been created according to the selection factors that were chosen as most important.
As stated by Ariyachandra and Watson (2010), based on organizational information
processing theories (OIPT) information processing needs to occur as a combination
of interdependence and task routineness. Also, both sponsorship level and
information processing needs manipulate creation of the strategic view of the
warehouse selection (Ariyachandra & Watson, 2010). Moreover, resource
constraints, the perceived ability of IT staff and urgency (facilitating conditions) also
influence the warehouse architecture selection (Ariyachandra & Watson, 2010)
(Figure 11).
Figure 11: An integrated model for DW architecture selection (Ariyachandra & Watson, 2010, p. 11)
24
24 Chapter 2: Literature Review
2.5 HEALTH INFORMATION MANAGEMENT
As mentioned by Johns (2002), information management is defined in several
ways by different authors. Synott and Gruber state (as cited in Johns, 2002), the
information management function provides control and management over
information resources. Also, Scheyman states (as cited in Johns, 2002, p.4)
information management “refers to information characteristics such as information
ownership, content, quality and appropriateness”. The information management tasks
that are performed traditionally in healthcare organisations are highly quantitative
and departmentally focused (Johns, 2002). The role of the health information
manager includes responsibility for managing health information in the given
context. The traditional activities of the health information manager include to
planning, developing and implementing systems designed to carry out tasks such as
control, monitor, store, retrieve data on a departmental basis (Johns, 2002). Today,
the tasks of the information manager are changing alongside the increasing
information complexity in healthcare and they act as an information broker of
information services such as information engineering, retrieval and analysis.
2.5.1 HEALTHCARE DECISION-MAKING
The following figure (Figure 12) shows the decision-making levels of an
organisation. The top level of decision-making involves strategic decision-making
(Johns, 2002). At this level managers make decisions about the overall goals of the
organisation. For instance, types of decisions made on this level include which
services need to be provided (such as acute, ambulatory or long term care). and at
which geographical location to operate (such as local, state, national) (Johns, 2002).
The second level concerns tactical decision-making. The decisions made on this level
relate to the tactical units of the organisation such as patient care services and
marketing (Johns, 2002). The third level concerns the day to day decisions of the
organisation such as hiring employees, ordering supplies and medications, processing
bills (Johns, 2002).
Chapter 2: Literature Review 25
Figure 12: Decision-making levels within an organisation (Johns, 2002, p. 36)
2.5.2 HEALTHCARE INFORMATION SYSTEMS AND DECISION-MAKING
The importance of information technology to healthcare services can be seen
differently from the perspectives of patients, professionals and government and
funding agencies. A patient expects easy access to personal information, knowledge
to provide self care, timely access to their healthcare professionals, privacy and up to
date care. On the other hand, professionals’ expectations of Information Technology
(IT) are different from those of the patients. As professionals they expect focused
information, support for effective use of IT, decision support tools and new education
and training. From government’s or funders’ perspectives accountability, efficiency,
sustainability and scalability are expected through the implementation of IT to
healthcare services (B. Barraclough, personal communication, March 31, 2009).
Therefore, the important issue to consider is to try to achieve these needs through
integrating IT with healthcare services. As stated by Lenz and Reichert (2007), to
offer IT support effectively it is vital to understand healthcare processes
characteristics.
According to Johns (2002), healthcare information systems were paper based
for more than a century. The first use of computers in healthcare was reported to be
between in 1960s and early 1970s. Evolution of healthcare information systems is
shown in Figure 13. There are many information system applications used in
healthcare today. As stated by Johns (2002), most of these applications are clinically
26
26 Chapter 2: Literature Review
oriented systems such as patient monitoring systems, nursing information systems,
laboratory information systems and so on. Also, there are applications which are
supportive for the operational activities or managerial activities of a healthcare
institution such as accounting information systems, human resource management
information systems and materials management. On the other hand, some of the
information systems are external to the organisation. As stated by Johns (2002), the
information manager of an institution understand the components of information
systems, how the system affects the organisation and others outside the organisation.
In the late 1980s, hospitals had started to implement many systems to support
strategic decision-making, managerial decision-making and quality improvement
(Johns, 2002). According to Grimson et al. (2000), previously, healthcare
organisations consisted of individual units which were operated independently from
one another and the need for information sharing was seen as less of a priority than it
is today. However, the inability of sharing information across systems and
organisations creates major barriers in progress on shared care as well as cost
containment (Grimson et al., 2000). Moreover, as mentioned by Johns (2002),
although the transactional databases contain a wealth of information it is impossible
to extract information for high level decision-making. Also, absence of integrated
healthcare leads to risks of medical treatment errors, lack of coordination, multiple
examinations and increased therapy costs (Stolba & Schanner, 2007). Furthermore,
according to Kerr, Norris and Stockdale (2007, p. 1017), “in the healthcare sector
lack of data quality has far-reaching effects. Planning and delivery of services rely on
data from different sources such as clinical, administrative and management
sources”. Therefore, if the quality of the data is higher it helps to retrieve better
information (Kerr et al., 2007).
Chapter 2: Literature Review 27
Figure 13: Timelining Health Information Systems Evaluation (Johns, 2002, p. 61)
As mentioned by Landrum, Peachey, Huscroft and Hall (2008), there are many
technological advances in use or under development to improve decision-making in
healthcare industry such as Decision Support Systems (DSS). These systems help the
hospital operate efficiently by reducing medical or prescription errors, organizing
staff and patients by reducing the patients waiting time and facilitating effective
diagnosis of the patients symptoms (Landrum et al., 2008). Some of the common
DSS in healthcare are marketing systems, cost accounting systems and case-mix
systems (Johns, 2002). These systems consist of tools that help the manipulation of
data and “what if” analysis scenarios for strategic decision-making (Johns, 2002).
According to Arigon et al. (2007), Clinical Decision Support Systems (CDSS) were
introduced to assist decision-making in healthcare. However, the scope of this is
limited when compared to clinical data warehousing (Arigon et al., 2007).
28
28 Chapter 2: Literature Review
Also, as stated by Rajan and Ramaswamy (2010), because health data are
derived from different environments there is a significant probability of errors and
uncertainty. Moreover, many factors such as poor data quality, inconsistent
representation and complicated domain knowledge etc., causes clinical decision-
making to be a labour intensive and error prone task (Zhou, Chen, Liu, Zhang, Wang,
Li, Guo, Zhang, Gao, & Yan, 2010). Therefore, effectively integrating health data
from different sources is becoming recognised as a crucial factor (Shams & Farishta,
2001).
There are a number of technologies available to integrate data. These include
data warehouses, database federations, database federation with mediated schemas
and peer data management systems (Louie, Mork, Martin-Sanchez, Halevy, &
Tarczy-Hornoch, 2007). As mentioned earlier, data warehouses integrate data from
different sources to a single repository. In a database federation, integration of
disparate sources is effected by using software programs that interface with the
source (Louie et al., 2007). The database federations with mediated schemas
address problems faced by database federations when integrating data sources from
different sources. They use mediated schemas which act as middleware in a database
federation. In peer data management system (PDMS) “each data sources provides
semantic mapping to either one or a small set of other data sources or peers (Louie et
al., 2007, p. 8).” Each of these data integration technologies has advantages and
disadvantages as shown in Figure 14. By using data and knowledge formalisms such
as relational schemas, semi-structured data and ontologies, data are integrated in the
above mentioned data integration architectures (Louie et al., 2007).
Figure 14: Advantages and disadvantages of data integration architectures (Louie et al., 2007, p. 6)
Chapter 2: Literature Review 29
However, data governance is also an important factor to consider when implementing
a data integration project. “Data governance refers to the overall management of the
availability, usability, integrity, and security of the data employed in an
enterprise”(Federal Student Aid, 2007). This will improve data consistency in
decision making, improve data security, decrease the regulatory fines and assign
accountability of data quality (Delgado, 2011). Although there are standards and
security and compliance frameworks available for the healthcare industry, healthcare
organisations should implement privacy programs to their data governance programs
(Delgado, 2011). To implement the effective privacy program basic elements such as
formal policy governance structure, written policies, funding and procedures to
handle complaints need to be addressed(Delgado, 2011).
2.6 DATA WAREHOUSING AND HEALTHCARE
“In recent years, medical professionals are witnessing an explosive growth in
data collected by various organisations and institutions” (Kerkri et al., 2001). Hence,
there should be effective systems to manage healthcare data. As mentioned before,
OLPT systems are not designed to provide support for the ad-hoc queries. The reason
is, although transaction systems are rich in information it is very difficult to obtain
the appropriately linked and analysed information for higher decision-making levels
such as managers, executives. One solution that many organisations turn to is
implementing a data warehouse concept (Scheese, 1998). According to Wah and Sim
(2009, p. 530), “data warehousing is becoming an indispensible component in data
mining process and business intelligence”. Increasing quantities of healthcare data is
not the only problem, healthcare expenditure is another problem. Healthcare
expenditure is increasing and is a burden for both individuals as well as governments
(Yan & Jianli, 2005). For instance, annually U.S. allocates a trillion dollars for
healthcare expenditure (Berndt, Fisher, Hevner, & Studnicki, 2001). Therefore, there
is a need for a strategy to reduce healthcare expenditure and to improve quality of
care.
In the context of the hospital systems, healthcare data comes from disparate
sources such as hospital administration systems, clinical databases and financial
systems and appears in many forms such as spread sheets, published books and other
data formats (Berndt et al., 2001). The data warehouse provides an opportunity to
30
30 Chapter 2: Literature Review
integrate these separated systems and provide help for efficient decision-making.
According to the survey have done by Health Industry Insights company in U.S.A.
(Figure 15) among 36 participants from healthcare provider chief information
officers (CIOs) it was shown that roughly 40% selected that their current use of
business and clinical intelligence is limited to deployment of data marts or data cubes
(Holland, 2009). Also, 35% indicated limited use of business intelligence
(BI)/clinical intelligence (CI) tools that are incorporated into their packaged software
applications (e.g. electronic medical records (EMR), financial applications).
Figure 15: Current use of BI/CI by healthcare organisations (Holland, 2009, p.9)
2.6.1 DATA WAREHOUSE IMPLEMENTATION EXAMPLES
According to Winter (2007), data warehousing concepts in the healthcare
environment have been implemented successfully in the private sector as well as in
some government agencies in the USA. He has mentioned real examples of success
stories of implementing data warehousing in the healthcare sector such as hospitals,
and among commercial healthcare providers. As stated by Winter (2007), most of
these healthcare organisations gain more benefits by implementing data warehousing.
Some of these examples are outlined below.
The Midwestern Health Insurance Company in USA uses their data warehouse
to identify and encourage optimal practices. The company found that the mortality
Chapter 2: Literature Review 31
rate in cardiac surgery was lower for some healthcare providers. Subsequently, the
significant finding was that mortality rate for bypass surgery for this insurer’s
members declined by 75%, from 4% to 1%. Another example involves, commercial
pharmacy savings of forty million dollars achieved in one sixth-month period with
their data warehouse based program (Winter, 2007).
Veteran’s Health Administration (VHA) in USA is another institution that
gains benefits from their data warehouse. The aims of their data warehouse use are to
improve the quality, efficiency and safety of its medical care; measure the
effectiveness of the care it offers; and to facilitate medical research. The VHA have
saved millions of dollars on an annual basis through better decision-making (Winter,
2007). Also, the New South Wales department of Health (NSW Health), in Australia
is another example for data warehousing success stories. NSW Health is responsible
for many services such as a State-wide ambulance service, mental health services,
drug and alcohol services and a network of community health centres etc. (Sybase,
2010). The new improvement to the data warehouse with Sybase provides an
opportunity to enhance their benefits in several ways. Some of these benefits are:
• Reducing data loads by 76 percent
• Achieving a data compression rate of over 70 percent
• Simplifying administration and reduce overhead costs
• Delivers queries 85 percent faster (Sybase, 2010, p. 1)
However, many findings show that certain factors are important for the
success of data warehousing. Winter (2007) introduces eleven critical factors that
should be addressed for successful data warehousing in healthcare services. These
factors include:
• The Enterprise approach
• Support for complex data structure
• Support for complex queries
• Large data volumes
• Concurrent and timely use
• Flexibility
32
32 Chapter 2: Literature Review
• Support and education
• High availability
• Privacy and security
• Data quality and standards
• High performance
Facilitating the enterprise approach to data warehousing provides the greatest
benefits to health services. Health data flows from multiple different areas to the data
warehouse. These data can flow from both internal as well as external sources.
Therefore, the integration of all these data for relevant decision-making is essential
(Winter, 2007). Concurrently, end users of the data warehouse need different views
of the data. For example a doctor needs a complete picture of a patient’s history of
tests, physical examinations, symptoms etc for making a clinical decision.
Alternatively, an insurer requires a complete picture of a hospital when providing
their services or its price structure. Likewise every user (physicians, payers,
regulators, and researchers) needs the same data filtered in different views.
Healthcare systems are dealing with large volumes of data and this is growing
rapidly day by day (Winter, 2007). Therefore, increasing the volume of data is a
challenge for data warehousing in healthcare services. The important thing to
facilitate is management of these high volumes of data in an efficient and effective
manner. When implementing a data warehouse, quality of data plays a major role.
According to Leitheiser (2001, p. 1), “healthcare organisations data is central to both
effective healthcare and to financial survival”. Therefore, data quality must be high to
provide reliable and dependable information for decision support. According to
Winter (2007), flexibility is another critical factor when implementing a data
warehouse. In other words, the data warehouse should be able to adapt to changes
which can occur due to variation in regulations, technology advances and fluctuations
in consumer expectations. The changes which occur may be simple or complex. For
example, new data types continue to grow with increasing use of images, text and
audio and must be accommodated.
The privacy and security of health related information also plays a major role
when implementing a data warehouse (Winter, 2007). As a data warehouse consists
Chapter 2: Literature Review 33
of data derived from multiple sources, it is important to provide security for this data.
Especially in healthcare, patients require their health information to be kept more
secure. Providing privacy for health records means only authorised persons have
access to the data considering the patients permissions (Winter, 2007). The
requirement of securing data in the data warehouse is becoming more complex with
the extent of data that has to be dealt with. This will be a major challenge for the
health sector in the future.
According to Winter (2007), the healthcare data model should be able to
provide support for the complex relationships along with the tables. For example
the outcome of medical tests may range from a single to voluminous and to
complicated output structures. In the future, use of more information may lead to
further increased size and complexity of medical data. Therefore, a data warehouse
must be able to support this complexity of the data model (Winter, 2007).The
support for complex queries is another related issue to be considered. Healthcare
data warehousing involves joining complex data from many different tables.
Therefore, sometimes complex queries must be written to get the required data from
the system. Hence, the writing of queries may involve handling large non-collocated
joins on multiple large tables (Winter, 2007).
The concurrent and timely use point is an important issue because data
warehouses being implemented today and in the coming era aligns with and support
many activities and strategic goals of healthcare enterprises. Therefore providing data
concurrently for many clients in a timely manner provides an effective and efficient
healthcare system. Similarly, the high availability point is another important issue to
consider (Winter 2007). Availability of up to date information at the time required by
decision makers supports an effective and efficient healthcare system.. However,
there is a great challenge in accomplishing the task of providing a data warehouse on
a large scale that is continuously updated, complex and heavily used. Nevertheless,
with the support of new technology many commercial organizations are already using
these solutions. As with the other issues, support and education are also vital when
implementing healthcare data warehousing. Users are encouraged by providing better
support and education and the importance of this in change management is well
recognised.
34
34 Chapter 2: Literature Review
Finally, according to the Winter (2007) the high performance point has three
basic meanings for a data warehouse: complete simple queries quickly; complete
large, complex queries efficiently and scalably; and load new data into the data
warehouse in a timely manner.
2.6.2 DATA WAEHOUSE IMPLEMENTATION CHALLENGES
However, as indicated by Winter (2007), there is a challenge for achieving all
these factors in a single platform. This is because to meet all these requirements at
once requires suitable architecture, organisation, readily usable applications and
executive support (Winter 2007). As stated by Lindsey and Frolick (2003), data
warehouse failures may involve multiple reasons. The following table will show
some of the reasons for data warehouse failure.
Table 4: Combined reasons for data warehouse failure (Lindsey & Frolick, 2003)
Table 4: (continued): Combined reasons for data warehouse failure (Lindsey & Frolick, 2003)
Chapter 2: Literature Review 35
In general, the most common factors are weak management support and
inadequate user involvement. As mentioned Lindsey and Frolick (2008), data
warehousing success may be obtained by avoiding a small number of critical factors
for failure rather than attempting to achieve all critical factors for success. Also,
according to the table, the main reasons for data warehouse projects failure in the
healthcare industry are insufficient funding, organisational politics and weak
sponsorship and management support. According to the survey done by Health
Industry Insights in the USA with a sample size of 33 CIOs health providers, it was
shown (Figure 16) that the three top barriers to the use of the business or clinical
intelligence applications at their organisation are lack of funding, lack of staff
resources and data quality and inconsistent data standards (Holland, 2009).
Figure 16: Top 3 barriers to the use of business/clinical intelligence applications (Holland, 2009,p.11)
According to Holland (2009), another survey conducted by IDC and InfoWorld
2008 among 516 end users and system integrators, show that the 45% of respondents
selected the main challenge is data quality (Figure 17). The next highest results at an
equal percentage (29%) was real-time data integration. Integrating BI software with
existing IT portfolio was selected as the other major challenge (Holland, 2009).
36
36 Chapter 2: Literature Review
Figure 17: Top 3 IT challenges to implementing/deploying BI applications (Holland, 2009,p.12)
2.7 SUMMARY AND IMPLICATIONS
The aim of data warehousing is to organise the gathering of a wide range of
data and store it in a single repository. The main components of the data warehouse
are Operational source system, Data Staging Area, Data Presentation Area, Data
Access Tools. Many authors have introduced different data warehouse modelling
methods such as the ER modeling approach, dimensional modeling approach, object
oriented approach. Among all the mentioned data warehousing approaches the most
frequently used approach is the multi-dimensional or data cube approach. Two
modeling techniques named star schema and snowflakes schema are used to
represent multidimensional data. There are two main basic methodological
approaches used to develop data warehouse design; those are the top- down approach
and the bottom-up approach.
The data warehouse is different from operational systems in many ways. The
primary difference between operational systems and data warehousing systems is that
operational systems are designed to support transaction processing (OLTP) and data
warehousing systems are designed to support online analytical processing (OLAP).
Moreover, differences can be seen in use of data, users, database sizes, transactions,
and data entry when compared with the OLPT systems. There are many different
architectural types that can be identified. Those are centralised data warehouse
Chapter 2: Literature Review 37
architecture, independent data marts (IDM), federated architecture (FED), hub and
spoke and data marts bus architecture. There are many factors contributing to
selection of data warehouse architecture selection and it is important to consider
these factors also when implementing the data warehouse.
As mentioned before, data warehousing technology predominantly aims to
structure the data in a summarised way which supports improved access to the data
and use of it in an efficient and effective manner. Currently, data warehousing plays a
major role in commercial businesses. The healthcare system is one sector dealing
with large amounts of data derived from many different sources. Therefore, there is a
need for a very effective system to capture, collate and distribute health data. From
the literature, it can be seen that decision-making with current information systems is
a very difficult task. This is because many issues such as lack of resources to
integrate data, lack in data quality and health data privacy and confidentiality
standards hinder effective decision-making. Integrating healthcare IS with new
technology paves the way to obtain a number of benefits such as improved access to
data, evidence based decision-making and provision of quality services etc.. There
are many data integration mechanisms available. There are many advantages as well
as disadvantages associated with these architecture types. However, from the
literature it can be seen that implementing the data warehouse concept is one of the
best potential solutions available that can be used for strategic and tactical decision-
making in healthcare.
Chapter 3: Research Design 39
Chapter 3: Research Design
This chapter describes the design adopted by this research to achieve the
aims and objectives stated in section 1.4 of Chapter 1. Section 3.1 discusses the
methodology and research design used in the study, the stages by which the
methodology was implemented, and the research design. Section 3.2 details the
participants in the study and section 3.3 lists all the instruments used in the
study such as the questionnaire and the face to face interviews. Section 3.4
outlines the timeline for the research project and section 3.5 discusses how the
data was analysed. Finally, sections 3.6, 3.7 and 3.8 discuss the ethical
considerations, intellectual property rights and health and safety issues of the
research project.
3.1 METHODOLOGY AND RESEARCH DESIGN
3.1.1 METHODOLOGY
The methodology used in the survey consists of four stages. The
following table shows the four stages and the research methods.
Stage Description Research Methods 1 Review data warehouse models Case studies
Literature Review 2 Study the cardiac surgery, ICU, quality and
safety and clinical costing units data repositories, decision-making processes and identify the issues
Questionnaire Unstructured Interviews Documentation review
3 Select or propose a suitable architecture Case studies Literature Review Interview/Collaboration
4 Develop and analyse the data product outputs of the model
Data analysing tool Interview
5 Analysis of the benefits of the model Feedback collect from the end users
Table 5: Methodology stages
40
40 Chapter 3: Research Design
Stage 1 – Review the data warehouse models
The first stage involves reviewing the data warehouse models available.
There are a number of data warehouse models which have been introduced in
fields such as healthcare, telecommunication and marketing. Therefore, literature
review and case studies provide valuable information as a first step for the
current research.
The literature review and case studies are important to,
• Develop background knowledge about data warehousing and its models and how
it differs from operational systems
• Identify how the data warehouse models are applied in different fields
• Study how the data warehouse concept is applied in the healthcare field
• Analyse what factors leads to selection of the optimal model
Literature search sources and information search strategies are covered in the
literature review methodology part (section 2.1).
Stage 2 – Study the data repositories, decision-making process and issues
As the second step, it is important to study the data repositories available in the
cardiac surgery, ICU, quality and safety and clinical costing units at The Prince
Charles Hospital. This will help to identify the databases and operational data stores
available and currently used for decision-making. To investigate the data repositories,
it is necessary to obtain assistance from the cardiac surgery unit, ICU, quality and
safety and clinical costing units staff and also the hospital IT department (Business
Solution Unit).
After identifying potential data repositories that are significant, it is necessary
to study the decision-making process itself. A questionnaire was developed as an
instrument to determine this. The questionnaire is further described in the
Instruments section below.
The sample of stakeholders identified was essentially a convenience sample of
a cross-section of roles related to the selected databases sources. The sample included
clinical data managers, unit managers and clinicians either directly or indirectly
involved in decision-making processes based on the selected data sources. This is
further described in the Participants section below. The questionnaire was then given
to the identified end users and stakeholders involved in facilitating or making
41
Chapter 3: Research Design 41
decisions with the current information systems to identify the current issues in the
decision-making process and requirements for the data warehouse prototype design.
Furthermore, unstructured interviews have been conducted to gather more detailed
information and identify barriers to the development of the optimal prototype design.
The responses to the survey will be thematically analysed to develop sample
clinical questions that are to be addressed by a warehouse model and to provide
information products for assessment; to identify issues in current data management,
integration and analysis; and to identify any potential issues for selection or
development of a warehouse model.
Stage 3 - Propose a suitable data warehouse model and develop the data
warehouse prototype
As a result of the previous two steps, stage 3 involves recommending a suitable
data warehouse model for the cardiac surgery and associated clinical units. The data
warehouse models that are potentially going to be used are described in section 3.2 of
the literature review. The data warehouse model will be selected following
integration of the literature review and analysis of the information gathered from the
questionnaire and observation of the data and decision-making processes at the
cardiac surgery unit with appropriate feedback and consultation with end-users. Five
sample decision intelligence problems have been selected to guide the table structure
development of the data warehouse prototype. The data warehouse prototype will be
developed using SAS data integration studio, which is a standard tool provided for
student use through QUT.
Stage 4 – Develop and analyse the information product outputs and benefits of
the model
The stage four involves analysis of the information product benefits of the data
warehouse prototype. The SAS enterprise guide data analysis tool is used to analyse
the integrated data. To conduct this analysis, the integrated data for five clinical
intelligence decision-making problems are analysed and the results will be displayed
in report format for evaluation by the stakeholders (clinicians, unit managers and data
managers from the ICU, cardiac surgery, clinical costing and quality and safety).
42
42 Chapter 3: Research Design
Stage 5 – Evaluation of DW model
The information products (outputs generate from the data warehouse prototype)
will be presented to the relevant stakeholders for clinical interpretation and
evaluation of the usefulness of the data warehouse model prototype. Generally,
Return On Investment (ROI) uses to measure the success of the data warehouse
(Threshold Consulting Services, 2005). Some other methods that can be used to
evaluate data warehouse model are, usage measurement, surveys, response time and
availability (Threshold Consulting Services, 2005). As stated by Shcherbatykh,
Holbrook, Thabane, & Dolovich (2008) randomised controlled trial (RCT) is another
methodology that can be used to assess benefits, harms and cost of health informatics
interventions and evaluate validity. However, implementing RCTs are challenging
in health informatics. This is because health informatics trials are involved with
‘complex interventions’(multifaceted) and also involving multiple targets such as
clinicians and patients. Another challenge is, some of the features of RCTs are not
always feasible in electronic health technologies (Shcherbatykh et al., 2008). Lack in
methodologic guidelines to conduct health informatics trials is another challenge
(Shcherbatykh et al., 2008).
In the real world situations it will take some time (one to three years or more)
to measure the actual benefits of the data warehouse, where changes to clinical care
models through quality improvement initiatives guided by warehouse information
products and management decisions may ultimately improve the health service.
According to de Mul et al. (2010), testing phase of the ICU data warehouse
development at the Erasmus Medical Center, Roterdam took almost 2 years to
complete. The evaluation of data warehouse is a complex and time consuming task it
is outside the scope of this project, therefore this research project will use feedback
collected from relevant end users to evaluate the potential usefulness of a data
warehouse based on the proposed warehouse prototype model.
3.1.2 RESEARCH DESIGN
The research design is essentially a case study incorporating qualitative
research methods to collect data from the cardiac surgery, ICU, quality and safety and
clinical costing units information managers and end users. The two methods used for
data gathering were firstly, a questionnaire was provided to end users to collect
43
Chapter 3: Research Design 43
information on the current decision-making process, on the issues and gaps in the
current decision-making process and to identify the data warehouse prototype
development requirements. Secondly, interviews were held after analysis of the data
collected from the questionnaire. The reason for conducting the subsequent
interviews was to clarify the information provided in the questionnaire and gather
more detail as required. The interviews were held only with people who agreed to
participate further as indicated in the appropriate response on the questionnaire.
Questionnaires and unstructured interviews were analysed thematically to identify the
decision-making issues, current decision-making process and user requirements.
Finally, qualitative analysis will be carried out to analyse the benefits after
developing the prototype and distributing the integrated analytical information
product.
3.2 PARTICIPANTS
The participants of this study are end users in the cardiac surgery, ICU, clinical
costing and quality and safety units. The end users are mainly clinicians, data
managers and unit managers of the above mentioned units directly involved in
decision-making or in supporting the decision-making process in healthcare practice.
Clinicians and unit managers are key decision makers for the units. Also, data
managers at the different units support the clinicians’ information needs as required
and are involved in facilitating the process of transforming data into useful
information. The questionnaire has been given to a sample size of ten participants
from the cardiac surgery, ICU, clinical costing and quality and safety units. The
small sample size of the survey was a result of availability of end users and the time
limitation of the project. As there are only few staff members involved in informatics
in clinical services, the participants enrolled represent a good cross section of
relevant staff related to cardiac surgical decision-making processes.
3.3 INSTRUMENTS
The instruments that are used in the survey are the questionnaire (Appendix A)
and the unstructured interviews. The questionnaire consisted of twelve questions
producing data for four categories of inquiry namely: current data repositories,
decision-making process, current issues, data storage and analysis needs. The
44
44 Chapter 3: Research Design
questionnaire was designed by referring to related questionnaire design theories,
sample questionnaire designed by a researcher (Mathew, 2008) and also gained
advice from my supervisory team and data warehousing literature providing
examples of similar surveys discussed in the book published by Golfarelli and Rizzi
(2009). The questionnaire has been given to the end users to identify the current
decision-making process at the cardiac surgery, ICU, clinical costing and quality and
safety units. This will help to gather the information in a structured way to identify
the issues in the decision-making process and to identify the main technical
requirements for the data warehouse prototype development. After analysing the data
collected from the questionnaire the unstructured face to face interviews were
conducted to collect further information and clarify and define the desired warehouse
information output. The design of subsequent interview questions was developed
based on the answers given in the questionnaire.
3.4 PROCEDURE AND TIMELINE
At the first stage of the research, the researcher identified and observed
database diagrams and held discussions with end users of the ICU, cardiac surgery,
clinical costing and quality and safety units. This process took three weeks. At the
second stage, the questionnaire was designed after referring to the literature and with
the help of my supervisory team. It was reviewed by my supervisory team and the
whole process took about two weeks. At the next stage, the hard copy of the
questionnaire was given to the end users of the ICU, cardiac surgery, clinical costing
and quality and safety units. The time arranged for answering the questionnaire was
twenty minutes. Answered responses were collected within one week. All the data
from the answered questionnaires were entered in to a Microsoft Excel spread sheet
for analysis. Survey results analysis took one week and as the next stage unstructured
interview questions were designed. Finally, unstructured interviews were conducted
only with end users who agreed to participate further as indicated on the
questionnaire. Interviews were conducted to collect further information required for
data warehouse prototype development. The length of the interviews was
approximately twenty minutes. Interviews were conducted among six participants
and data collected from the interviews were recorded in the written format. Both
questionnaire distribution and interviews were conducted at the Prince Charles
45
Chapter 3: Research Design 45
Hospital under the supervision of the Cardiac Surgical Registry Coordinator. All the
collected data (questionnaire and interview responses) were kept in secure place and
treated confidentially.
3.5 ANALYSIS
Both questionnaire and unstructured interviews (face to face) survey results
were analysed according to three sections: Firstly, as the current decision-making
process and secondly the issues related with current decision-making process at the
cardiac surgery unit, ICU, clinical costing unit and quality and safety unit. Thirdly,
the survey results were analysed to gather information for the technical details of the
data warehouse prototype development. Following the analysis of the questionnaire
five sample decision problems that could potentially be resolved by the decision
makers following use of a warehouse information products were selected and put into
a table (as shown in Table 6).
3.6 ETHICS AND LIMITATIONS
As mentioned in the research methodology this research involved collecting
data by questionnaire and unstructured interviews. The participants were selected
from the cardiac surgery , ICU, quality and safety and clinical costings units at
TPCH. The participants are data and information end users (clinicians, data
managers, unit managers, directors) of these units. The questionnaire did not include
any individual idenfiable data unless the participants indicated willingness to be
interviewed and provided their name and contact details.
Also, unstructured interviews were conducted after data analysis from the
questionnaire. Interviews were conducted with end users to clarfy details and the
warehouse application requirements. Although unstructured interviews were held
with individuals, no sensitive data likely to have any negative effect on the
individuals was collected. All these data were kept in a secure place and treated
confidentially.
There is a need to access some of the identifiable clinical data in the data
repositories when testing the data warehouse prototype and developing the
information product outputs. Therefore, to ensure the confidentiality of such data and
provide accountability and responsibility, documents were signed with the Prince
46
46 Chapter 3: Research Design
Charles Hospital to make the researcher an honorary employee, therefore bound to
the Queensland Health Code of Conduct. This preventive measure ensures a)
authorised on site access to data but no removal or transfer of data outside Qld Health
premises, for example by storing data in any manner such as in personal computer,
USBs, CD’s etc.. b) prohibition from disclosure or discussion of patients personal
information with others. Ethics committee clearance approval was received from
both QUT and the Prince Charles Hospital ethics committees.
3.7 INTELLECTUAL PROPERTY RIGHTS
This research project has been commenced as a part of the Masters of Research
course at the Queensland University of Technology (QUT). As a part of the research,
at some stage work has to be carried out with collaboration with Prince Charles
hospital. The final report of the study may provide commercial value to both QUT
and Prince Charles hospital. Approval has been given by the QUT for IP right
process.
3.8 HEALTH AND SAFETY
This research project does not involve working with any kind of biomedical,
biochemical or biological materials. However, this research involved interviews and
therefore the researcher applied for assessment of work and got the approval.
Chapter 4: Results Analysis 47
Chapter 4: Results Analysis
This chapter provides analysis of results collected from the survey. Data
collected from the survey instruments was analysed according to three sections.
Section 4.1 explains the current decision-making process and section 4.2 identifies
the current decision-making issues. The final section (section 4.3) summarises the
data warehouse prototype development requirements.
A total of ten questionnaires were distributed to stakeholders in the cardiac
surgical decision-making processes. An 80% response rate was achieved (8 out of
10) although 30% (3 of 10) did not wish to participate in further interviews. Only ten
questions (out of 12) were analysed due to lack of responses returned for questions 9
and 10. Questions 1-4 were analysed to identify the current decision-making process
and questions 6-8 and question 11 were analysed to identify current issues in the
decision-making process and finally, questions 5 and 12 were analysed to identify the
user requirements for data warehouse prototype development.
4.1 CURRENT DECISION-MAKING PROCESS
According to the questionnaire responses, 87.5% of responders’ use data from
outside of their data repositories for the decision-making process. Also, the cardiac
surgery unit shares or would like to share information with other hospital units,
especially ICU, the quality and safety unit and the clinical costing unit. However,
limited interaction between these databases creates inefficiencies in decision-making.
Moreover, results analysed from the questionnaire as well as unstructured interviews
indicated there is limited or no access to some of the data repositories. According to
the results of the questionnaires some aspects of the current decision-making
processes involving multiple sources of data are as follows,
• If clinicians working in the quality and safety unit requires information from
many other databases (including ICU, CARPIA, e-DS) they have to contact the
data custodian in the other departments to extract and provide specific data.
48
48 Chapter 4: Results Analysis
• If clinicians from the ICU unit need information from the e-DS, CARPIA,
Transition II, again they need to contact data custodians from the specific unit
and possibly also from the IT department to collect the specific data.
• A unit manager from the cardiac surgery unit has direct access to CARPIA. But
unit manager need to contact data custodians of the other units to collect specific
data.
When considering the end users decision-making process it can be seen that
currently there is a high degree of repetitive manual process related to data access and
acquisition. The clinician or unit managers collect the data separately by contacting
data custodians and individually integrate and assemble the data for analysis through
laborious and time consuming linking processes.
4.2 DECISION-MAKING ISSUES
Participants identified that there are many issues related to their current
decision-making processes. According to the questionnaire responses, the majority of
participants (75%) were not satisfied with the support provided by the current
information systems for decision-making. The following figure (Figure 18) shows the
response rate.
Figure 18: Current support from the IS’s for decision - making
49
Chapter 4: Results Analysis 49
According to further findings from the questionnaire and unstructured
interviews with end users from the clinical costing unit mentioned, systems need
to be integrated and should provide easier access to data. Also, the end users from
the cardiac surgery unit pointed out the difficulty in reusing data already held in
other data repositories to combine with cardiac surgical data to inform quality
improvement studies. Another respondent from the cardiac surgery unit mentioned
that there is a need of comprehensive data availability at all stages of point of care
and current systems do not support this. Furthermore, details response from a
quality and safety unit end user stated that “there is lack of support available for
the current decision-making process from the current information systems and
need a centralised data management (process) to improve decision-making”.
Analysis of the question regarding current decision-making issues, revealed
most of the end users (75%) have selected integration of data from other data
repositories as the main problem for current decision-making (Figure 19). The end
users such as clinicians and unit managers frequently know which information
they require, and from where the information is available, but they do not have
effective methods to integrate the data. As mentioned before the clinicians or unit
managers contact the data custodians to collect required data separately. This is a
time consuming and often complex process for both parties. For instance, from the
clinicians point of view, they have to analyse the data collected from separate
units (for example ICU, CARPIA, e-DS), from the data managers point of view, it
takes some time to obtain and integrate information for complex ad hoc queries or
they may have to contact the IT department or research staff to gather or extract
some of the data.
50
50 Chapter 4: Results Analysis
Figure 19: Decision-making issues with current IS’s
Limited accessibility to data and lack of data availability is the next main
problem pointed out by the end users (63%). As mentioned before, there is limited
access to databases or some end users may have difficulty obtaining authority to
access data repositories. According to further unstructured interviews held with
end users the main reasons identified are, security and confidentiality issues or
Queensland Health information related policies. With a rate of 50%, respondents
also selected lack of efficient reporting tools and lack of time and resources to
undertake analysis as two other problems. According to the data collected from
the questionnaire, analysis tools employed by the units are SPSS, Microsoft Excel
and QI Macros. However, in further interviews it is indicated that there is a need
to implement better data analysis tools.
51
Chapter 4: Results Analysis 51
Figure 20: Data quality issues in current decision-making process
The Figure 20 shows the data quality issues most often indicated on the
questionnaire were the lack in data completeness (more than 60%). Lack in
accurate consistency was selected by 50% of respondents as the next highest data
quality issue. Also, respondents indicated that lack in data accuracy (38%) and
lack in relevance (25%) were also data quality issues faced by them in the
decision-making process.
4.3 APPLICATION DEVELOPMENT REQUIREMENTS ANALYSIS
To develop the cardiac surgery data warehouse prototype, a sample of
clinical decisions or analysis processes made by the end users/ stakeholders were
firstly summarised from the questionnaire and interview responses. Five
significant clinical decision problems were selected from all the responses with
assistance from the cardiac surgical unit coordinator. The following table shows
the analysis of the user requirements according to the sample decision-making
problems. Table 6 shows the identification of data sources required for integration
to resolve the decision-making problem. Further analysis of the data sources
together with discussion with relevant stakeholders ensured the correct selection
52
52 Chapter 4: Results Analysis
of records and data items from the sources were included for later analysis and
information product reporting.
No Problems/decisions/analysis Data
repositories
Users
1 What are the clinical risk scores according to certain group? (eg: according to procedure, ventilation time)
CARPIA
ICU
Clinician, Unit
Manager
2 What is the expenditure per episode of care according to procedure, ventilation time?
CARPIA
Transition II
Unit Manager,
Clinician
3 What is the rate of e-discharge summaries send to GP’s according to clinical guidelines for the cardiac surgical patients according to operative data, surgical consultant?
CARPIA
e-DS
Unit Manager,
Clinician
4 What is the cost of various post operational complications according to the morbidity groups captured by the cardiac surgical registry?
CARPIA
Transition II
Unit Manager,
Clinician
5 Audit data sources to verify costings data includes high cost procedures appropriately?
CARPIA
Transition II
Unit Manager, Clinician
Table 6: Decisions/ Problems would like to address by end users
Figure 21 shows the analysis results for the question regarding the security
and privacy concerns for data warehouse development. According to the figure it
can be seen that 50% selected that there are no concerns of regarding
incorporating data security and information privacy in the data warehouse
development. Also, it is important to notice that more than 35% not answered and
less than 20% indicated that data security and information privacy should be
incorporated into the data warehouse development.
53
Chapter 4: Results Analysis 53
Figure 21: Security and privacy concerns for DW prototype development
Chapter 5: Data warehouse prototype development 55
Chapter 5: Data warehouse prototype development
This chapter outlines the data warehouse prototype development for TPCH.
Section 5.1 briefly explains the business intelligence tools used to develop the data
warehouse. Also, this section gives some details and benefits of the SAS data
integration studio and the SAS warehouse administrator tool, which are used for this
research. Section 5.2 provides details on business intelligence tools and section 5.3
explains the cardiac surgery data warehouse prototype selection and development
process step by step. Section 5.4 shows and discusses the information product output
result of the data warehouse prototype. The final section (section 5.5) shows the
feedback that was gathered from the end users to evaluate the data warehouse
prototype.
5.1 BUSINESS INTELLIGENCE TOOLS
The paper published by Sen and Sinha (Sen & Sinha, 2005, p. 81) compares the
15 different available data warehouse methodologies. These methodologies are
grouped into three categories: core technology vendors, infrastructure vendors and
information modelling companies.
Core technology vendors
The core technology vendors are those who sell the database engines. “These
vendors use data warehousing schemes that take advantage of the nuances of their
database engines”(Sen & Sinha, 2005, p. 82). The methodologies categorised into
core technology vendors by Sen and Sinha are NCR’s Teradata-based methodology,
Oracle’s methodology, IBM’s DB2-based methodology, Sybase’s methodology, and
Microsoft’s SQL Server-based methodology.
Infrastructure vendors
As mentioned by Sen and Sinha (2005) infrastructure vendors are people who
are involved in the data warehouse infrastructure business. The infrastructure tools
have mechanisms to manage metadata repositories and to extract, transform and load
56
56 Chapter 5: Data warehouse prototype development
data into the data warehouse. Also, these infrastructure tools have an ability to work
with other database engines (Sen & Sinha, 2005). Some examples for this category
are SAS’s methodology, Informatica’s methodology, Computer Associates’ Platinum
methodology, Visible Technologies’ methodology, and Hyperion’s Methodology.
Information modelling companies
This category includes Enterprise Resource Planning (ERP) vendors such as
SAP, PeopleSoft and business consulting companies such as Cap Gemini Ernst
Young and IT/data-warehouse consulting companies such as Corporate Information
Designs and Creative Data (Sen & Sinha, 2005).
As mentioned before, the data warehouse prototype in this study was developed
using SAS data integration studio 4.2. A constraint of this study was the availability
of Data Warehouse technology. As SAS was able to provide this technology for
student use to QUT, it was an expedient choice. However, as SAS is the third largest
business intelligence vendor worldwide (Vesset, 2010) it is considered a reasonable
choice for demonstrating a data warehouse prototype development in this clinical
environment. For this research project SAS software was used in two stages, at the
backend SAS data integration studio to develop the data warehouse prototype and at
the front end SAS enterprise guide was used to analyse data. Features of the data
warehouse technology are described below.
5.1.1. SAS/WAREHOUSE ADMINISTRATOR 4.3
SAS/ Warehouse Administrator is a tool which has been developed to design
the data warehouse/ data mart processes. It is a “customizable solution that offers a
single point of control, making it easier to respond to the ever-changing needs of the
business community”.
Some benefits of SAS/Warehouse Administrator software,
• Integrates extraction, transformation and loading tools for design data
warehouses/ data marts.
• Provides a better framework for effective warehouse management.
57
Chapter 5: Data warehouse prototype development 57
• Facilitates business subject definition, consolidation of business rules,
scheduling of processes for warehouse maintenance and integration with decision-
support tools for effective warehouse exploitation.
• Provides data warehouses more quickly to gain many benefits.
5.1.2 SAS DATA INTEGRATION STUDIO
SAS data integration studio 4.2 is a powerful tool that helps data warehouse
developers and data integration specialists to carry out data integration more
efficiently and effectively (SAS Institute Inc, 2006). SAS data integration studio
provides user friendly interfaces, extensive built in transformations and management
of complex enterprise data integration processes. Also, this software tool is easy to
use, collaborative and helps to integrate data faster and more effectively (SAS
Institute Inc, 2006).
Some of the benefits of SAS data integration studio include:
• Always access the data needed
SAS data integration studio enables accessing and processing data from legacy
systems or latest ERP applications. Also, new source systems can be simply included.
All these help to save time and assist decision makers to collect information they
required.
• Improve productivity
SAS integration studio provides a better user friendly interface for developing
and documenting the work. Also, manual coding is available when required. New
team members can adapt quickly to others work when needed.
• Manage security and administration at all levels
SAS data integration studio has opportunity to establish security and
administration levels quickly and easily. The reusable templates help to provide role
based authorization and administrative privileges at all levels efficiently.
• Deliver consistent, trusted and verifiable information.
This tool always delivers accurate information as needed. Also, data quality
tools help to examine the quality of data in the source systems. Furthermore, SAS
58
58 Chapter 5: Data warehouse prototype development
data integration studio assists users to identify where from the data is derived and
how it was transformed (SAS Institute Inc, 2006).
5.2 DATA ANALYSIS TOOLS
Data analysis tools are used to identify the patterns of the enterprise data. This
will provide useful insight about the trends in business. Some of the commonly used
data analysis tools are R, SPSS, SAS, Excel, Stata, Matlab etc.. For this research I
have used SAS enterprise guide data analysis tool. A brief introduction of this
software is given below.
5.2.2 SAS ENTERPRISE GUIDE
SAS is considered one of the appropriate and efficient analysis tools available
for producing data in the report form. SAS Enterprise Guide provides a SAS
graphical interface to publish dynamic results in a Microsoft Windows client
application (SAS Institute Inc, 2010). This application provides better information
for business analysts, programmers and statisticians (SAS Institute Inc, 2010). Some
of the benefits of this application include:
• Provide a self-service environment
• Provide efficient access to data sources
• Make reporting and analytics available to everyone.
(SAS Institute Inc, 2010)
5.3 CARDIAC SURGERY DATA WAREHOUSE PROTOTYPE SELECTION AND DEVELOPMENT
5.3.1 MODEL SELECTION RATIONALE
The selection of a specific data warehouse model is very challenging in the
healthcare sector. The selection of a data warehouse model for the Cardiac surgery
unit at TPCH was based on an integration of the literature review and the analysis of
user requirements from the stakeholder survey. According to the literature, the data
warehouse models that are mostly implemented or favoured are the federated data
59
Chapter 5: Data warehouse prototype development 59
warehouse model, centralised data warehouse model, enterprise data warehouse
model or hub and spoke data warehouse model. Some of the examples include:
• The Center for Medicare and Medicaid services (CMS) is a federal agency
that manages the Medicare and Medicaid programs in USA. Over the past
years, they have developed a number of data marts; more recently, they are
trying to implement an enterprise wide data warehouse model to integrate
data from different sources (Winter, 2007).
• Veteran’s Health Administration (VHA) in USA is another example of data
warehouse implementation (Winter, 2007). As described by Winter (2007),
they have implemented a corporate data warehouse (enterprise data
warehouse) to provide intelligence support for many clinical concerns such
as obesity, diabetes and depression. More recently, extensions have been
suggested for the enterprise data warehouse by introducing operational data
store (ODS) and web-based safety net interface and hybrid communication
functionalities (Bala, Venkatesh, Venkatraman, Bates, & Brown, 2009) . The
one main reason identified to introduce this type of extension is to be able to
respond quickly in large scale disasters (Figure 22).
Figure 22: VHA corporate data warehouse visual architecture (Bala et al., 2009, p.138)
60
60 Chapter 5: Data warehouse prototype development
• A paper published by Stolba, Banek and Tjoa (2006) discusses the
implementation of the federated data warehouse model supporting
evidenced based medicine. In this paper the authors are primarily concerned
about the security and the privacy issues of the healthcare data.
• Another paper published by Stolba and Schanner (2007), suggests a
federated data warehouse model to integrate clinical data (Figure 23).
Figure 23: Medical federated data warehouse model (Stolba & Schanner, 2007, p. 5)
As mentioned by Stolba and Schanner (2007), in this model domains such as
medical treatment, social insurance and pharmaceutical participate in one
federation while some others communicate through web services and some
may transfer data directly to the federation.
• Zhou et al (2010), describe implementation of a data warehouse for traditional
Chinese medicine for clinical and research purposes. This data rehouse model
is similar to centralised warehouse architecture.
61
Chapter 5: Data warehouse prototype development 61
Figure 24: CDW architecture for traditional Chinese medicine (Zhou et al., 2010, p. 141)
Examination of the data warehousing implementation examples in the
healthcare sector shows there is no one exact data warehouse model applicable for all
healthcare. The selection of a specific data warehouse model may depend on many
selection factors as those discussed by Ariyachandra and Watson (2010). Also, when
considering some of the examples it can be seen that the organisations do not
necessarily perpetuate a unique data warehouse model and the data warehouse model
may change to provide maximum benefit. This can be seen from the first and the
second examples. For instance, the CMS federal agency in USA developed data
marts and recently they have been planning to develop an enterprise wide data
warehouse model to integrate data from different sources. Also, VHA in the USA
have an enterprise data warehouse model and recently some authors suggest an
extension to this warehouse by introducing an operational data store (ODS) and a
web-based safety net interface and hybrid communication functionalities to improve
efficiency in the event of large scale of disaster.
As mentioned before, user requirements for the TPCH cardiac surgery data
warehouse development were collected through the questionnaire and by interviews
from the end users. After analysis of the data collected from the questionnaire, it was
summarised to provide a sample of important decisions that would like to address by
the end users (as shown in Table 6). As the next step, the required tables and data
fields from the source databases such as ICU, CARPIA, e-DS and Finance were
62
62 Chapter 5: Data warehouse prototype development
identified. In consideration of the user requirements and data warehouse
implementation literature the recommended data warehouse would be a centralised
data warehouse (Figure 24). This is because in the context of this study situation,
the only requirement is integration of four institutional data repositories which are
used to help make the selected sample decisions in the cardiac surgery unit. But, this
model may not be suitable if it were required to integrate many external sources and
progress to a global solution.
The centralised data warehouse model maintains data in the central store, and it
improves the access to data integrated from the different units of the hospital when
compared with the architecture of independent data marts. According to the survey
done by Ariyachandra and Watson among 454 participants, who are involved in data
warehouse implementing process (data warehouse managers, data warehouse staff
members, information system managers and independent consultants), the majority
selected the hub and spoke data warehouse model and federated data warehouse
model requires more development time (Ariyachandra & Watson, 2005). Another
important factor is development costs and maintenance costs of the data warehouse.
According to the survey conducted by Ariyachandra and Watson across 454
participants such as data warehouse managers, data warehouse staff members,
information system managers and independent consultants, hub and spoke data
warehouse model has the highest average cost for development (around US$
2,000,000.00 - US$ 2,500,000.00 ) and also the maintenance (around US$
1,000,000.00 – US$ 1,125,000.00) (Ariyachandra & Watson, 2005). Independent
data marts, data mart bus and centralised data warehouse model development costs
were in the range of US$ 1,500,000.00 – US$ 2,000,000.00 and also, average
maintenance costs of the data marts bus data warehouse models and centralised
models were found to be in the range of US$ 750,000.00 – US$ 1,000,000.00.
From this, it can be seen that a centralised data warehouse is more cost
effective and needs less development time compared to an enterprise wide
architecture and federated architecture. However, later on a centralised model could
be extended to an enterprise wide model /hub and spoke model or federated model if
required. Figure 24 shows the proposed centralised data warehouse model for cardiac
surgery unit.
63
Chapter 5: Data warehouse prototype development 63
Figure 25: Proposed data warehouse model for the TPCH Cardiac surgery unit
5.3.2 DEVELOPMENT PROCESS
To develop the data warehouse as desired, data was required to be integrated
from four different sources: the cardiac surgery unit, clinical costing unit, ICU unit,
quality and safety unit. However, these databases have been developed and operated
independently by different units. For example the clinical costing unit already uses an
enterprise data warehouse developed for handling finance data. However, there is no
facility for users to directly access the database servers of the clinical costing unit
database as the system is an Enterprise development which is housed at State level
with restricted access. The ICU database is also restricted. Also, there are issues with
direct connections to the CARPIA and e-DS servers. Therefore for the purpose of this
study, data excerpts from CARPIA, ICU and transition II databases were saved into
three separate spread sheet files (.csv format).
Step 1- As the first step two star schemas named risk scores star schema and cost star
schema were designed for analysis (Figures 26 and 27). Then, the SAS library was
defined to store the source data for the data warehouse prototype development. The
source data were stored in the Sample TPCH source data library.
64
64 Chapter 5: Data warehouse prototype development
Step 2- As the second step, meta data were registered for the SAS source tables. All
the extracted data from the ICU, Finance, CARPIA data were loaded into the library.
SAS data integration studio registers meta data from different sources such as content
servers (HTTP server, ftp server etc.) database servers (Oracle server, SQL server,
ODBC servers, Sybase server etc.) and enterprise application servers (SAP server).
Step 3- The third step involved design of the dimension and fact tables for the data
warehouse prototype. The new fact tables and dimension tables were designed as
shown in the star schemas diagrams (Figures 26 and 27) and registered the metadata
for the tables.
Facts tables
There are two facts tables named “cost” and “risk” scores which were created
(Appendix B).
• Risk score fact table
The risk score fact table contains data from finance database, CARPIA
database and ICU database. The table consist of FLDURNUMBER (patient hospital
admission number), FLDTHECNO (theatre encounter number), SCORE (risk score
from the ICU), PREDMORT (risk scores measurement from CARPIA), OPDATE
(operation date from the Cardiac surgery unit), CAREUNITADMDATE (patients
admission date to the ICU from Cardiac surgery unit), RISKOFDEATH (from ICU
database).
• Cost fact table
The cost fact table includes data from the Transition II databases and CARPIA.
This table contains data fields such as FLDURNUMBER, FLDTHECNO, DRG
(Diagnostic Related Groups), HOSPAADATE (hospital admission date from
CARPIA), HOSDISDATE (hospital discharge date from CARPIA database),
TOTALCOST (from transition II database).
65
Chapter 5: Data warehouse prototype development 65
Dimension tables
The dimension tables are shown in Table 7 below.
Table name Description
Patient Dimension This table contains patients information such as
FLDURNUMBER,FLDFIRSTNAME, FLDSURNAME,
FLDDOB, FLDAGE etc..
Patient cases Dimension This table stores data related complications of each
patient’s case and morbidities.
DRG Dimension This table stores the DRG’s and the DRG descriptions.
ICU diagnosis Dimension This table includes the patients diagnosis information
from the ICU.
Table 7: Dimension Tables
Also, two more dimension tables were introduced as the risk score fact table
Doctor Dimension and Date Dimension. This will help to analyse data according to
different levels relevant to the different stakeholder information needs. Moreover,
more dimension tables or fact tables can be designed and added according to
evolving decisions which need to be made by the end users.
Step 4- After designing the new tables and registering the meta data, the data was
transferred to the target tables (dimension and fact tables). Before transferring the
data from the source table, data validation was conducted for CARPIA source data,
ICU data, finance data and patient data tables for key fields. For example in the
CARPIA source data table, data validations were performed for missing data on
FLDURNUMBER, OPDATE, PREDMORT data fields and Custom validation for
Bleeding complication. All the invalid data was sent to an error table for rectification
or discard and only the valid data loaded in to the CARPIA valid data table (see
target tables. Figure 28 shows the data warehouse model developed using SAS data
integration studio. Appendix B shows the screen shots of populating fact and
dimension table, and some source table (CARPIA and finance) data validation steps.
66
66 Chapter 5: Data warehouse prototype development
Figure 26: Risk score star schema
ICU DIAGNOSIS DIMENSION
CAREUNITADMID(PK) CAREUNITID IMMUNEDISEASE DIAGNOSTICSYSTEM DIAGNOSTICCODE DIAGNOSTICTEXT PRINCIPLE PROCEDURE DIAGNOSIS SEQUENCE ICD_LONG_DESC PRINCIPAL_SECONDARY POST OP COMPLICATION
PATIENT DIMENSION FLDURNUMBER (PK) FLDSURNAME FLDFIRSTNAME FLDMIDDLENAME FLDDOB FLDAGE FLDGENDER FLDMEDICARE FLDMEDICARENUM FLDDECEASED ..............
PATIENT CASES DIMENSION FLDTHENCNO (PK) FLDSXCATCABG FLDSXCATAV FLDSXCATMV FLDSXCATTV FLDSXCATPV FLDSXCATAW FLDSXCATMISC FLDDATE_REVIEW FLDMORB_STATUS FLDBLEEDING FLDDYSFUNCTION ............
RISK SCORE FACT TABLE FLDURNUMBER (FK) FLDTHENCNO (FK) CAREUNITADMID (FK) DOCTORID (FK) DATEID(FK) CAREUNITADMDATE OPDATE PREDMORT SCORE RISKOFDEATH
DOCTOR DIMENSION DOCTORID (PK) DOCTORFIRSTNAME DOCTORSURNAME GENDER ADDRESS SPECIALTY .............................
DATE DIMENSION DATEID(PK) YEAR MONTH DATE
67
Chapter 5: Data warehouse prototype development 67
Figure 27: Cost star schema
COST FACT TABLE FLDURNUMBER (FK) FLDTHENCNO (FK) DRG (FK) HOSPAADATE HOSDISDATE TOTALCOST
PATIENT DIMENSION FLDURNUMBER(PK) FLDSURNAME FLDFIRSTNAME FLDMIDDLENAME FLDDOB FLDAGE FLDGENDER FLDMEDICARE FLDMEDICARENUM FLDDECEASED ..............
PATIENT CASES DIMENSION FLDTHENCNO(PK) FLDOPDATE FLDOPTIME FLDSXCATCABG FLDSXCATAV FLDSXCATMV FLDSXCATTV FLDSXCATPV FLDSXCATAW FLDSXCATMISC FLDDATE_REVIEW FLDMORB_STATUS FLDBLEEDING FLDDYSFUNCTION ............
DRG DIMENSION DRG(PK) DRGANDDESCRIPTION
68
68 Chapter 5: Data warehouse prototype development
Figure 28: Cardiac Surgery unit data warehouse model
Step 5 -Finally, the SAS enterprise guide analysis tool was used to configure the
data in the reporting format.
5.4 DATA ANALYSIS USING THE DATA WAREHOUSE PROTOTYPE
The SAS enterprise guide 4.2 was used to analyse the data to answer
questions 1, 2, 4 and 5. The first question addressed was question 1 from Table
6:”Comparison of risk scores – group by PREDMORT” (In the ICU risk score is
named as Score and Cardiac surgery unit risk is named as PREDMORT). The
69
Chapter 5: Data warehouse prototype development 69
following figure (Figure 29) shows an information product for the analysis results;
for the comparison of risk scores from the cardiac surgery unit and the ICU, grouped
by cardiac surgery risk score (named as PREDMORT).
Figure 29: Comparison of risk scores –group by PREDMORT
This will provide clinicians from the cardiac surgery unit and ICU with a better
understanding of the relationship between the preoperative cardiac surgery risk score
for death and significant morbidities and risk of death from ICU comparison, and
relates to the questions such as how the average risk for different clinical groups
varies after surgery and what factors are involved and how to use the risk scores to
improve performance outcomes in the cardiac surgical unit and ICU for cardiac
surgical patients. Figure 30 shows the graphical display of interaction of risk scores.
Figure 30: Interaction of risk scores
70
70 Chapter 5: Data warehouse prototype development
Figure 31 shows an example of an information product from the prototype data
warehouse to support decision-making processes based on Q2 from Table 6: “ The
actual expenditure (AU$) per episode of care according to certain clinical groups: by
procedural groups”. This shows the SAS analysis report for the actual expenditure
per episode of care according to the major cardiac surgical clinical procedural groups
and the results are grouped by patient age. This result gives clinicians the ability to
understand how the total cost of an episode of care in the finance database relates to
patient groups according to the surgeons’ frame of reference that is the clinical
procedure groups used by surgeons in their clinical audit and monitoring processes.
This can then be further combined with Transition II data to compare actual costs for
these clinical groups with the State funds provided to the hospital according to the
DRG groups. The information can be further broken down according to other clinical
criteria such as age groups (as shown), or hospital post-operative morbidities
captured on the CARPIA database such as deep sternal infections, or physiological
parameters captured in the ICU database such as core body temperature variations at
admission to ICU following surgery.
Figure 31: The actual expenditure per episode of care according to the certain clinical group
71
Chapter 5: Data warehouse prototype development 71
Analysis results for question Q4: “Cost of various post operative complications
(AU$) – by bleeding morbidity group” shows (Figure 32) the summary statistics for
the cost of various post-operative complications for example grouped by post-
operative bleeding morbidity group. This will help clinicians to identify the cost
implications of clinical issues and prioritise the quality improvement process as well
as potentially evaluate cost savings from quality improvement processes resulting in
reduced high cost morbidities, thereby valuing and appropriately resourcing such
activities.
Figure 32: Cost of reoperation for bleeding as an example of post operational complications (AU$)
Analysis results for question 5: “Audit data sources to verify costings data includes high cost procedures appropriately” is shown in Figure 33. This output shows the costs associated with the DRG’s allocated according to the cardiac surgery unit admission status. Further analysis of this can contribute to evaluation of appropriate funding structures for institutions according to surgery status performed.
72
72 Chapter 5: Data warehouse prototype development
Figure 33: Costs associated with the DRG’s- according to cardiac
surgery unit admission status (AU$)
Figure 33: (continued) Costs associated with the DRG’s- according to
cardiac surgery unit admission status (AU$)
73
Chapter 5: Data warehouse prototype development 73
Figure 33: (continued) Costs associated with the DRG’s- according to
cardiac surgery unit admission status (AU$)
Limitations and constraints in the data extraction and data analysis process
must be considered in the interpretation of these information products and include:
1. Extracted data from the CARPIA, ICU, Transition II are limited to a sample
of year 2009.
2. Did not address the question 3 (“What is the rate of e-discharge summaries
send to GP’s according to clinical guidelines for the cardiac surgical patients
according to operative data, surgical consultant?”) because of the technical
difficulties experienced in directly connecting to the e-DS database and time
limitation of the research project.
3. All the extracted data are restricted to the Cardiac surgery unit patients.
4. When comparing the risk scores from CARPIA and ICU, a small number of
ICU patient’s data from patients who returned back to ICU on the same day
were excluded.
5. Data analysis is limited to 1000 patient records due to the study’s time
constraints.
74
74 Chapter 5: Data warehouse prototype development
In general, gain from the benefits of data warehousing may take some time, for
example, changes resulting from mismatch of State funding compared to actual costs
for certain procedure groups may require further analysis and reporting to further
stakeholders to facilitate change to the costing structure. On the other hand, some
benefits can have a more rapid local effect, such as recognition by clinicians of the
differential costs for various morbidities and implications for selection of quality
improvement activities. In this research project, the data warehouse prototype was
evaluated by collecting the feedback from the end users after reporting and
explaining the analysis results.
5.5 DATA WAREHOUSE PROTOTYPE EVALUATION
According to Welbrock (1998, p. 1), the “majority of data warehouse
implementations are never monitored for their success”. Welbrock (1998) states that
“the measurement of the success of the data warehouse is outside the experience of
information technology specialists”. As stated by Welbrock (1998) this is because,
the data warehousing process is largely a business when compared to technological
exercise. Also, according to a white paper published by Threshold Consulting
services, there is a difficulty in choosing success metrics for data warehouses;
however, return on investment (ROI) is mostly used to measure data warehouse
success (Threshold Consulting Services, 2005). Moreover, some other data
warehouse success indicators that can be used to measure success are usage
measurement, customer satisfaction, availability, performance and response time.
(Threshold Consulting Services, 2005).
In this research project feedback was collected from end users to evaluate the
potential importance of the data warehouse information products for the Cardiac
surgery unit. The feedback was collected from data managers, unit managers and
clinicians via short structured interview based on the question “How does the
information product support the decision-making process at the clinical service
level?”. According to the data managers point of view, “implementation of a data
warehouse will reduce the time required in producing the reports compared to the
current process”. Also, “it provides a better way of getting the complete picture of
patient groups over a variety of important service aspects for service planning”.
Moreover, they have highlighted some issues anticipated with the actual
75
Chapter 5: Data warehouse prototype development 75
implementation of such a data warehouse. The issues that are of concern are data
access rights, security and data quality. Also mentioned were the importance of
introducing policies and data stewardship.
According to one of the Cardiac surgery Registry Coordinator (unit managers)
view, “information products generated from the data warehouse prototype are
valuable and useful in data analysis used for specific issues or problems defined by
the clinicians”. As an example given by the unit manager, the costs associated with
re-operation for bleeding output (Figure 31) is a very useful information product.
Also, unit managers mentioned that “this output could be used as part of a report on
post operative bleeding to build a complete picture for the clinicians, of the clinical
factors contributing to representation for bleeding and the full consequences of this
issue for service management”. It was also mentioned that in comparison to the
previous process of acquiring this costing data, the data warehouse made the data
more readily available to filter according to the clinical dimensions held in CARPIA,
so a full analysis of the implications of the clinical guidelines and patient
management regarding this issue was more easily facilitated.
The feedback from the Clinician from the Cardio-Thoracic surgery unit agreed
that, “developing a data warehouse is very valuable for the clinicians”. Furthermore,
he mentioned “the output shows (Figure 32) the costs involved relating to clinical
variances impact on patient management which informs the selection of a variety of
forms of management available for patients of the clinical service”.
Chapter 6: Discussion 77
Chapter 6: Discussion
Data warehousing technology predominantly aims to structure the data in a
summarised way which supports improved access to and use of the data in an
efficient and effective manner. Integrating healthcare services with such new
technology paves the way to obtain a number of benefits including improved access
to data, support of evidence-based decision-making and ultimately support of quality
healthcare services. Therefore, data warehouses can be considered a useful tool for
the support of strategic and tactical decision-making in healthcare.
To determine how data warehousing might practically contribute to improved
decision-making this study firstly examined the current data driven decision-making
process in the clinical environment. The research questions that addressed this is
“What decision-making issues exist or are faced by healthcare professionals with the
current information systems?”. This considers what issues currently exist in
information driven decision-making and whether a data warehouse may contribute to
overcoming these in the study environment. Analysis of the survey responses showed
that the data manipulation in the current decision-making process at TPCH is mostly
a repetitive manual process. For complex clinical or management questions requiring
data beyond that available from the Cardiac Surgical Registry (CARPIA), the
clinician or unit managers collect data separately by contacting data custodians and
individually integrating and assembling the data for analysis through laborious and
time consuming manual linking processes. Also, questionnaire responses indicated
that support for the capability of the present information systems to fully support
current decision-making needs is very low. This can be seen by referring to figure 18,
current support from the IS’s for decision-making. The issues which were
specifically identified as being amenable to improvement by data warehousing were
the integration data from other sources or availability of access to other relevant
clinical or administrative repositories. This was shown in the result to the question in
current decision-making issues in Figure 19. Also, other major issues that were
highlighted by respondents in the questionnaire were medical record data quality and
78
78 Chapter 6: Discussion
availability, lack of efficient reporting tools and lack of time and resources to
undertake analysis.
The data quality issue most indicated was the lack in data completeness
followed by lack in data accuracy and lack in compatibility. For the decision-making
process it is important to have complete records of clinical data. As stated by
Chapman (2005), incomplete data does not support comprehensive analysis of data
and may lead to poor or incorrect conclusions. According to Botsis, Hartvigsen, Chen
and Weng (2010), data inconsistencies are caused from uncoordinated or redundant
data entries. On the other hand, data accuracy is an important factor to consider
because false or incorrect data can potentially lead to medical errors (Connecting for
Health Common Framework, 2006). It has the potential to cause errors in
management decision-making and result in avoidable financial and quality costs to
the hospitals. However, data warehouse cannot address all the data quality issues. As
stated by Singh and Singh (2010) data quality problems may also occur in the phases
of the data warehouse development. For example, during the ETL phase where data
cleansing is taking place data quality issues can be occurred due to programs written
for extraction, transformation and load functions (Singh & Singh, 2010). However,
data quality tools such as Data Flux, Trillium Software, WizSoft etc. can be used to
improved data quality.
In this research the next main research question addressed was “How might
decision-making be improved within healthcare services by implementing a more
aligned data warehousing model or models?”. According to the literature, many
factors lead to the selection of a specific data warehouse model. However, by
reviewing the literature it can be seen that there is no universal data warehouse model
suitable for all healthcare services. This can be seen from the variety of models
demonstrated in the data warehouse implementation examples. On the other hand,
data warehouses implementations were not constrained to be fixed as one model.
Sometimes organisations changed or added extensions to the original data warehouse
model to gain maximum benefits. This can be seen from the CMS and VHA
examples (Bala et al., 2009; Winter, 2007). For instance, CMS in the USA initially
implemented several data marts however later they required an enterprise wide data
warehouse model to integrate data from different sources. Also, the VHA in USA
had already developed a corporate data warehouse, to provide support for many
79
Chapter 6: Discussion 79
clinical concerns such as obesity, diabetes, depression etc.. More recently, an
extension was introduced to this data warehouse by introducing an operational data
store (ODS) and web-based safety net interface and hybrid communication
functionalities. Also, from the literature it can be seen that organisations select or
suggest enterprise data warehouse models or federated data warehouse models when
there is a need of enterprise level data integration for the organisation.
The user requirements analysis is one of the main phases of the data warehouse
development process. As stated by List, Schiefer, & Tjoa (2000), the user
requirements analysis phase helps to identify the user needs for data warehouse
development. Also, this phase plays an important role in defining data staging
designs, data warehouse systems architecture, training course plans, data warehouse
system maintenance and upgrade (Golfarelli & Rizzi, 2009). However, there are
many reasons that cause this phase to deliver ambiguous, incomplete and short lived
requirements such as, some projects are long time projects and it is difficult to collect
every requirement, some decisions are poorly shared across the organisation and
decision processes may vary when time goes on (Golfarelli & Rizzi, 2009). By
integrating literature and case studies together with user requirements gleaned from
the stakeholder survey responses (Table 6), it was determined that the centralised
data warehouse model for the cardiac surgery unit would be most suitable at this
stage. The centralised data warehouse model improves the access to data integrated
from the different units of the hospital when compared with the architecture of
independent data marts. Also, according to the literature it is clear that compared to
the federated data warehouse model and hub and spoke data warehouse model, the
centralised warehouse model development and maintenance costs are very low.
In this research project to develop the cardiac surgery data warehouse
prototype, the types of decisions/analysis made by the end users from the
questionnaire and interview responses were firstly summarized and five specific
decisions/problems selected to formulate the data warehouse specifications.
However, focus on a few decision points is not sufficient to determine the data
warehouse model. Therefore, the literature and case studies related to healthcare data
warehouse development were reviewed. Then the tables and data fields related to the
five decisions/problems were identified. The two star schemas were designed to
analyse risk scores and costs (Figure 26 and Figure 27).
80
80 Chapter 6: Discussion
The SAS data integration studio 4.2 was used to develop the data warehouse
prototype. This software tool is easy to use, collaborative and helps to integrate data
faster and more effectively. However, high level of expertise and knowledge is
recommended when using the software when actual implementation of the data
warehouse is considered. This can be seen from the technical issues have to be faced
such as software configuration, direct data integration from CARPIA and e-DS, data
transformation issues. Therefore, the researcher had to contact SAS technical support
to solve some of the problems.
Although development of a data warehouse is a time consuming process,
because of the complexity of the clinical information, it provides an effective way to
handle and use data from several disparate units. It integrates data from different
sources and improves access to the financial and clinical information. This can be
seen from how the end users make decisions currently with the information from the
other units. Also, output reports created from the data warehouse prototype shows
how outputs will help in end users decision-making in clinical services. For instance,
“actual expenditure for episode of care” output provides opportunity for clinicians to
understand how the total cost of an episode of care in the finance database relates to
patient groups according on the surgeons frame of reference such as clinical
procedure group. Also, when developing a data warehouse, an ETL (extract,
transform and load) process helps to identify data quality issues and support a data
improvement strategy. For example as stated by Albert et al (2004), they have
implemented a project oriented data warehouse to supply data for online computing
of the Variable Live Adjusted Displays (VLADs). The purpose of this data
warehouse is to avoid incomplete or inaccurate data for VLADs (Albert et al., 2004).
Development of data warehousing will improve quality and safety monitoring
and help with better clinical care. For instance, output of comparison of risk scores
(Figure 29) from ICU and cardiac surgery will assist clinicians to improve
performance outcomes in the cardiac surgery unit and ICU for cardiac surgical
patients. Another example is, the result output generated for the cost of various
operational complications (according to the bleeding morbidity) help clinicians to
identify the cost implications of clinical issues and prioritise the quality improvement
process (Figure 32). Furthermore, integrating data repositories provides data for
clinical effectiveness and evaluation research. For instance, summary statistics for the
81
Chapter 6: Discussion 81
actual expenditure for episode of care output help clinicians to understand how the
total cost of episode of care in the finance database relates to patient groups
according on the surgeons’ frame of reference such as procedure groups. This
information is further broken down by age group to gain a clearer picture of analysis.
Thereby, developing a data warehouse will maximise the usefulness of data with
greater efficiency and help to answer more complex questions about patient
management and efficient health service management.
Other than this, all the end users pointed out the importance of the data
warehouse for the decision-making process when compared to the current process as
a valid and practical proposition. However, that the data managers addressed the
issues related to data quality, access rights and security in the information product
evaluation interview is worth noting, and the significance of which has been found by
other researchers such as Winter and deMul. But, in the survey majority of
respondents mentioned that, there are no concerns of data security and information
privacy should incorporate to the data warehouse development. This may be due to
the lack of understanding of data warehouse implementation by end users or they did
not understand the question properly. However, increased data exchange brings the
issue of confidentiality and access to the fore and this is well recognised. According
to Clifton (2004), “a comprehensive framework that handles the fundamental
problems underlying privacy preserving data integration and sharing is necessary”.
Also data quality is again identified as being a critical factor in the use of integrated
information and while data warehouse can provide some support for data quality
improvement, there is further organisation change management that needs to occur to
effectively address this. There are many limitations and difficulties experienced in
the development of this project as presented in sub section 6.1.
6.1 LIMITATIONS OF THE STUDY
There are number of limitations to this study of the development of a service
level clinical data warehouse prototype. Firstly, because of the time limitation and
complexity of clinical decision-making processes the scope of the project is focused
on the few selected decision points identified to inform the user requirements
analysis for this data warehouse prototype. The researcher only selected five
important decisions points made by the end users (clinicians, unit managers) when
82
82 Chapter 6: Discussion
analysing the questionnaire and finally only four questions were fully addressed in
the results due to time constraint and technical issues. Secondly, design of the
questionnaire was only limited to identifying the current decision making process,
current decision making issues in CIS and data warehouse prototype design
requirements. Thirdly, the researcher only focused on the decisions that can be made
integrating data from the limited data repositories identified such as CARPIA, e-DS,
Transition II and ICU. This was a result of the constraints of a short time frame to
study this topic. The selected data repositories represented the major external data
repositories identified by key staff members that contribute to more complex
decision-making. Therefore, this data warehouse prototype is a sample and does not
represent all the data required for comprehensive decision-making or may not help to
answer all the questions which may be asked. There are several other important data
repositories including those of the Main Operating Theatre, Anaesthetics and the
Cardiology Medical unit that would provide valuable data for integration, but were
not considered in the scope of this project. Fourthly, all the data from the different
sources were loaded as external files into SAS data integration studio. This is
because there is no direct access available to transition II and ICU database from the
cardiac surgery unit system.
The SAS data integration studio 4.2 software was used to develop the data
warehouse prototype while SAS enterprise guide was used to analyse data. There are
some constraints in this software such as enabling views of the database diagram
from SAS data integration studio. This is because it is needed to install SAS
Information map Studio application where the proprietary licensing for this
application is constrained. Therefore, as the fifth limitation, the actual database
relationship diagram could not be presented from the SAS application, however this
did not limit the actual development process as the diagrams presented in Microsoft
Word format. Sixthly, a rudimentary approach was selected to evaluate the data
warehouse prototype. This is because, the available methodologies such as RCT’s,
qualitative methods and ROI’s used to evaluate the benefits of a data warehouse may
be time consuming, costly and complex.
There were also a number of technical issues related to the server
configurations encountered when attempting to directly connect to the CARPIA and
e-DS databases through SAS data integration studio as the software application is a
83
Chapter 6: Discussion 83
non-supported application for Queensland Health. Therefore, remote access outside
of the Queensland health Standard Operating Environment had to be established
which created some burden for the information technology support services. These
difficulties might be policy level issues worth considering by the data warehouse
development process, however this is beyond the scope of this study. Some of these
technical limitations are the result of this study being an implementation at service
level rather than at enterprise level, as is more common and which would have access
to greater IT support resources. These issues will need to be considered by both those
wishing to implement a service level data warehouse and those supporting hospital IT
infrastructure.
Finally, because of Queensland Health information related policies and
confidentiality requirements of TPCH patient data, the researcher cannot give screens
shots of actual fact or dimension data tables and analysis results shown are
approximations of what actual data product results would be. Also, analysis of the
data using SAS Enterprise Guide 4.2 only used a sample of 1000 records due to time
limitations for processing.
Chapter 7:Conclusion 85
Chapter 7: Conclusion
The literature on data warehousing provides detail on contemporary data
warehousing theory and practice. Data warehouse helps to integrate data from
disparate systems. Data warehouses are distinct from operational systems in many
ways. Many differences have been described elsewhere such as in the use of data,
users, database sizes, transaction type, data entry when compared with the OLPT
systems. There are many different architectural types such as centralised data
warehouse architecture, independent data marts (IDM), federated architecture (FED),
hub and spoke and data marts bus architecture which can be identified. With regard
to selection of the data warehouse architecture, it has been found that there are many
contributing factors and it is important to consider these factors when implementing a
data warehouse.
This research identified that the current decision-making process at the cardiac
surgery unit with the other units is a manual decision-making process. Also, there are
several issues in the decision-making process at the Prince Charles Hospital.
Difficulty of integrating data from other data repositories was identified as a major
issue. Also, other issues that were highlighted by respondents were medical record
data quality and availability, lack of efficient reporting tools and lack of time and
resources to undertake analysis. Moreover, the main data quality issues that were
identified are lack in data completeness, lack in data accuracy and lack in
compatibility. Research suggests that implementing centralised data warehouse will
minimise the issues faced in the current decision-making process and also, provide
many benefits such as improved access to data, improved quality and safety
monitoring, provide data for clinical effectiveness and evaluation research and
improved decision-making.
A number of limitations existed during this project. Because of the time
limitation the scope of the project only considered the integrating four data
repositories CARPIA, e-DS, Transition II and ICU. Also, only five decisions were
selected from the questionnaire responses as user requirements time constraint.
Furthermore, data from databases loaded as external files, because of difficulty of
86
86
integrating data directly from the databases and some technical issues encountered
when try to connect to the database e-DS and CARPIA. Data analysis is limited to
analysing 1000 records due to processing time. Also, actual fact tables or dimension
tables are not provided due to Queensland Health information related polices and the
confidentiality requirements of TPCH patient data.
7.1 RECOMMENDATIONS AND FUTURE DIRECTIONS
When looking at the current decision-making issues that a data warehouse
might provide some solutions decision-making process. Therefore, based on this, it is
recommended that a warehouse might contribute to resolving some of the issues
raised in the survey of this research. For instance, development of data warehouse
provides better accessibility and it integrates disparate data sources and improves
decision-making. However, it is important to investigate selection of software for
data warehousing (which is not performed in this research). Furthermore, it is
important to address barriers to warehouse implementation at service level for
example policy, technical support for application in SOE (Standard Operating
Environment). There is also a need to address data quality and data access/privacy
issues together with warehouse implementation. With the above mentioned
recommendations the development of a centralised data warehouse in future will
provide many benefits for the cardiac surgery unit.
87
Bibliography 87
Bibliography
Agosta, L. (2005). Hub-and-Spoke Architecture Favored. Information Management Magazine. Retrieved from http://www.information-management.com/issues/20050301/1021501-1.html
Albert, A. A., Walter, J. A., Arnrich, B., Hassanein, W., Rosendahl, U. P., Bauer, S.,
Ennker, J. (2004). On-line variable live-adjusted displays with internal and external risk-adjusted mortalities. A valuable method for benchmarking and early detection of unfavourable trends in cardiac surgery. European Journal of Cardio-Thoracic Surgery, 25(3), 312-319.
Arigon, A.M., Miquel, M., & Tchounikine, A. (2007). Multimedia data warehouses:
a multiversion model and a medical application. Multimedia Tools and Applications, 35(1), 91-108.
Ariyachandra, T., & Watson, H. (2005). Data warehouse architectures: factors in the
selection, decision and success of the architectures. Retrieved May 24, 2010, from http://www.terry.uga.edu/~hwatson/DW_Architecture_Report.pdf
Ariyachandra, T., & Watson, H. (2010). Key organizational factors in data warehouse
architecture selection. Decision Support Systems. Retrieved April 24, 2010, from Scopus database.
Bala, H., Venkatesh, V., Venkatraman, S., Bates, J., & Brown, S. H. (2009). Disaster
response in healthcare: A design extension for enterprise data warehouse. Communication of the ACM. 52(1), 136-140. Retrieved April 21, 2010 from ACM Portal database.
Berndt, D. J., Fisher, J. W., Hevner, A. R., & Studnicki, J. (2001). Healthcare data
warehousing and quality assurance. Computer, 34(12), 56-65. Retrieved May 3, 2010 from IEEE Xplore digital library database.
Bonifati, A., Cattaneo, F., Ceri, S., Fuggetta, A., & Paraboschi, S. (2001). Designing
data marts for data warehouses. ACM Transaction Software Enineering Methodology., 10(4), 452-483. Retrieved May 3, 2010 from ACM Portal database.
Borysowich, C. (2007, 2010). Better Data Warehouse Modelling. Retrieved May 24,
2009, from http://it.toolbox.com/blogs/enterprise-solutions/better-data-warehouse-modelling-20835
Chapman, A. D. (2005). Principles of Data Quality (Vol. Version 1.0): Global
Biodiversity Information Facility, Copenhagen. Retrieved May 24, 2009, from http://www2.gbif.org/DataQuality.pdf.
88
88 Bibliography
Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. SIGMOD Record., 26(1), 65-74. Retrieved April 21, 2010 from ACM Portal database.
Clifton, C., Doan, A., Elmagarmid, A., Kantarcioglu, M., Schadow, G., Suciu, D., et
al. (Producer). (2004) Privacy preserving data integration and sharing. retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.8127
Connecting for Health Common Framework. (2006). Background issues on data
quality. Retrieved June 16, 2010, from http://www.policyarchive.org/handle/10207/bitstreams/15515.pdf
de Mul, M., Alons, P., van der Velde, P., Konings, I., Bakker, J., & Hazelzet, J.
(2010). Development of a clinical data warehouse from an intensive care clinical information system. Computer Methods and Programs in Biomedicine, In Press, Corrected Proof. Retrieved August 2, 2010 from ScienceDirect database.
del Hoyo-Barbolla, E., & Lees, D. (2002). The use of data warehouses in the
healthcare sector. Health Informatics Journal, 8(1), 43-46. Retrieved August 2, 2010 from http://jhi.sagepub.com/cgi/content/abstract/8/1/43
Delgado, M. (2011). The Evolution of Health Care IT: Are Current U.S. Privacy
Policies Ready for the Clouds? Paper presented at the IEEE World Congress on Services (SERVICES), 2011. Retrieved August 20, 2011 from IEEE computer society database.
Denton, T. A., Chaux, A., & Matloff, J. M. (1995). A Cardiothoracic Surgery
information system for the next century: Implications for managed care. The Annals of Thoracic Surgery, 59(2), 486-493. Retrieved August 2, 2010 from ScienceDirect database.
Dias, M. M., Tait, T. C., Menolli, A. L. A., & Pacheco, R. C. S. (2008). Data
warehouse architecture through viewpoint of information system architecture. Retrieved August 2, 2010 from IEEE computer society database.
Embarcadero Technologies. (2010). Healthcare Data Management Survey Report.
San Francisco: Embarcadero Technologies. Retrieved February 3, 2011, from http://www.embarcadero.com/images/dm/healthcare-it-survey-report-2010.pdf.
ExecutionMih. (2010). Dimentional model schemas -Star, Snow-flake, Constellation
Retrieved May 3, 2010, from http://www.executionmih.com/data-warehouse/star-snowflake-schema.php
Federal Student Aid. (2007). Enterprise Data Management-Data Governance Plan.
Retrieved from
89
Bibliography 89
http://federalstudentaid.ed.gov/static/gw/docs/ciolibrary/ECONOPS_Docs/DataGovernancePlan.pdf
Golfarelli, M., & Rizzi, S. (2009). Data warehouse design: Modern principles and
Methodologies. New York: McGraw-Hill Companies. Grimson, J., Grimson, W., & Hasselbring, W. (2000). The SI challenge in health
care. Communication of the ACM, 43(6), 48-55. Retrieved April 21, 2010 from ACM Portal database.
Holland, M. (2009). The future of business and clinical intelligence in the U.S.
provider market: Health Industry Insights. Retrieved from http://www-935.ibm.com/services/au/gbs/bus/html/healthcare/presentations/downloads/the_future_of_business_clinical.pdf
Inmon, B. (1999). Data mart does not equal data warehouse. Retrieved from
http://www.dmreview.com/dmdirect/19991120/1675-1.html Inmon, W. H. (2005). Building the data warehouse: Wiley Publishing
Inc.,Indianapolis. Isken, M. W., Littig, S. J., & West, M. (2001). A data mart for operations analysis.
Journal of healthcare information management, 15(2). Retrived from Google Scholar http://www.himss.org/content/files/ambulatorydocs/DataMartForOperationsAnalysis.pdf.
Jani, A. B., Davis, L. W., & Fox, T. H. (2007). Integration of databases for
radiotherapy outcomes analyses. Journal of the American College of Radiology, 4(11), 825-831. Retrieved August 2, 2010 from Science Direct database.
Johns, M. L. (2002). Information Management for health professions (Second edition
ed.): Delmar Thomson Learning Inc. Kadlec, J. (2005). SQL Server OLTP vs. data warehouse performance tuning.
Retrieved from http://searchsqlserver.techtarget.com/tip/SQL-Server-OLTP-vs-data-warehouse-performance-tuning
Kerkri, E. M., Quantin, C., Allaert, F. A., Cottin, Y., Charve, P., Jouanot, F.,
Yétongnon, K., (2001). An approach for integrating heterogeneous information sources in a medical data warehouse. Journal of Medical Systems, 25(3), 167-176. Retrieved August 2, 2010 from Springer database.
Kerr, K., Norris, T., & Stockdale, R. (2007). Data quality information and decision-
making: A healthcare case study. Paper presented at the 18th Australasian conference on Information systems.
90
90 Bibliography
Kimball, R., & Ross, M. (2002). The Data Warehouse Toolkit (2nd Edition ed.). Toronto: John Wiley and Sons, Inc.
Landrum, W. H., Peachey, T., Huscroft, J. R., & Hall, D. (2008). Research in
healthcare DSS: Where do we go from here? Paper presented at the Americas Conference on Information Systems (AMCIS). Retrieved May 22, 2010 from http://aisel.aisnet.org/amcis2008/358
Leitheiser, R. L. (2001). Data quality in health care data warehouse environments.
Paper presented at the 34th International conference in system science, Hawaii. Retrieved August 2, 2010 from IEEE computer society database.
Lenz, R., & Reichert, M. (2007). IT support for healthcare processes - premises,
challenges, perspectives. Data & Knowledge Engineering, 61(1), 39-58. Retrieved August 2, 2010 from ScienceDirect database.
Lindsey, K., & Frolick, M. N. (2003). Critical factors for data warehouse failure.
Business Intelligence Journal, 8(1). List, B., Bruckner, R., Machaczek, K., & Schiefer, J. (2002). A comparison of data
warehouse development methodologies case study of the process warehouse. In A. Hameurlain, R. Cicchetti & R. Traunmüller (Eds.), Database and Expert Systems Applications (Vol. 2453, pp. 203-215): Springer Berlin / Heidelberg.
List, B., Schiefer, J., & Tjoa, A. (2000). Process-oriented requirement analysis
supporting the data warehouse design process a use case driven approach (pp. 593-603).
Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., & Tarczy-Hornoch, P. (2007).
Data integration and genomic medicine. Journal of Biomedical Informatics, 40(1), 5-16. Retrieved August 2, 2010 from ScienceDirect database.
March, S. T., & Hevner, A. R. (2007). Integrated decision support systems: A data
warehousing perspective. Decision Support Systems, 43(3), 1031-1043. Retrieved August 2, 2010 from ScienceDirect database.
Marco, D. (2000). Independent Data Marts - Part 1. The Data Administration
Newsletter. Retrieved from http://www.tdan.com/view-articles/4881 Mathew, A. (2008). Asset management data warehouse data modelling. Queensland
University of Technology, Birsbane. Mohania, M., Samtani, S., Roddick, J., & Kambayashi, Y. (2007). Advances and
research directions in data-warehousing technology. Retrieved May 22, 2010 from http://dl.acs.org.au/index.php/ajis/article/view/287
Ponniah, P. (2010). Data Warehousing fundermentals for IT Professionals. Retrieved
from
91
Bibliography 91
http://books.google.com.au/books?id=3PJTgyUIGk4C&printsec=frontcover&source=gbs_atb#v=onepage&q&f=false
Sahama, T. R., & Croll, P. R. (2007). A data warehouse architecture for clinical data
warehousing. Paper presented at the Proceedings of the fifth Australasian symposium on ACSW frontiers - Volume 68. Retreived May 22, 2010 from ACM digital library database.
Sanders, D., & Protti, D. (2008). Data Warehouses in Healthcare: Fundamental
Principles. ElectronicHealthcare, 6(3). Retrieved from June 2, 2010 from http://www.longwoods.com/content/19510
SAS Institute Inc. (2006). SAS data integration studio. Retrieved from
http://www.sas.com/technologies/dw/etl/distudio/factsheet.pdf SAS Institute Inc. (2010). SAS Enterprise Guide. Retrieved 15 September 2010,
from http://www.sas.com/technologies/bi/query_reporting/guide/index.html Scheese, R. (1998). Data warehousing as a healthcare business solution. Healthcare
Financial Management, 52(2), 56. Retreived March 22, 2010 from ProQuest database.
Sen, A., & Sinha, A. P. (2005). A comparison of data warehousing methodologies.
Commun. ACM, 48(3), 79-84. Retrieved 21 April, 2010 from ACM digital library database.
Shams, K., & Farishta, M. (2001). Data wareohusing: Toward knowledge
management. Topics in Health Information Management, 21(3), 24-32. Shcherbatykh, I., Holbrook, A., Thabane, L., & Dolovich, L. (2008). Methodologic
issues in halth informatics trials: The complexities of complex interventions. Journal of the Americal Medical Informatics Association, 15(5).
Singh, R., & Singh, K. (2010). A descriptive classification of causes of data quality
problems in data warehousing. International Journal of Computer Science, 7(3).
Stolba, N., Banek, M., & Tjoa, A. M. (2006, 20-22 April 2006). The security issue of
federated data warehouses in the area of evidence-based medicine. Paper presented at the The First International Conference on. Availability, Reliability and Security, 2006. (ARES 2006). Retrieved August 2, 2010 from IEEE computer society database.
Stolba, N., & Schanner, A. (2007). eHealth Integrator -Clinical Data Integration in
Lower Austria. Paper presented at the Third International Conference on Computational Intelligence in Medicine and Healthcare (CIMED 2007).
Sybase. (2010). New South Wales Health. Retrieved June 30th 2010, from
http://www.sybase.com.au/detail?id=1050806
92
92 Bibliography
Tan, R. B. N. (2006). Online analytical processing systems. Retrieved August 20,
2011 from http://www.irma-international.org/viewtitle/10720/ Threshold Consulting Services. (2005). Measuring the success of a data wareohuse.
Retrieved from http://www.thresholdcs.com/Knowledge-Base/White-Papers/Measuring-the-Success-of-a-Data-Warehouse.pdf.
Vesset, D. (2010). Worldwide Business Intelligence Tools 2009 Vendor Shares: IDC.
Retrived http://www.sas.com/news/analysts/IDC- ITools09VendorShares.pdf Wah, T. Y., & Sim, O. S. (2009). development of a data warehouse for Lymphoma
cancer diagnosis and treatment decision support. WSEAS Transactions on Information Science and Applications, 6(3). Retrieved April 28, 2010 from http://www.wseas.us/e-library/transactions/information/2009/28-906.pdf
Welbrock, P. R. (1998). Is your datawarehouse successful? developinga data
warehouse process that responds to the needs of the enterprise. Paper presented at the Annual 11th Conference NESUG' 98. Retrieved from http://www.nesug.org/proceedings/nesug98/atut/p068.pdf
Winter, R. (2007). Health care data warehousing in the government. Massachuettes:
Winter Corporation. Retrieved June 29, 2010 from http://www.wintercorp.com/WhitePapers/Health%20Care%20Data%20Warehousing%20in%20the%20Government%20v3.pdf
Yan, Z., & Jianli, G. (2005, 13-15 June 2005). A kind of data warehouse in
community healthcare service system. Paper presented at the Services Systems and Services Management, 2005. Proceedings of ICSSSM '05. 2005 International Conference on Service Systems and Service Management. Retrieved August 2, 2010 from IEEE xplore digital library database.
Zhou, X., Chen, S., Liu, B., Zhang, R., Wang, Y., Li, P., et al. (2010). Development
of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artificial Intelligence in Medicine, 48(2-3), 139-152. Retrieved August 2, 2010 from ScienceDirect database.
93
Appendices
APPENDIX A: QUESTIONNAIRE
94
94 Appendices
Questionnaire 1. Which unit are you associated with?(Please tick check box)
Cardiac surgery Quality & Safety Unit ICU Clinical Costings Unit
Other………………………………….(Please specify)
2. What is your designation at The Prince Charles Hospital? (Please tick the check box)
Clinician Unit manager/director Data Manager/ Information analyst/ Informatician Other....................... (Please specify)
Current data repositories:
3. Do you use data from repositories outside of your own service units (Cardiac surgery, ICU, Quality& Safety, Clinical Costings) to assist decision-making in service management needs?(Please tick one box)
Yes (If yes please go to question 3.1) No (If No please go to question 3.2)
3.1 Select which data repositories you use to assist decision-making (You can select more than one answer) ICU database Cardiac surgical (CARPIA) e- Discharge summary Clinical Costings(Transition II)
3.2 Select which data repositories would you like to use to assist decision-making (You can select more than one answer) ICU database Cardiac surgical (CARPIA) e- Discharge summary Clinical Costings (Transition II)
95
Decision-making process: 4. How do you collect or access data from the listed data repositories such as ICU unit/
Cardiac surgical/e-discharge summary unit, Transition II for service management needs? ICU unit data repository
1. Direct data access or integration from ICU data repository 2. Contact data custodian to collect specific data from ICU unit 3. Contact IT department to collect specific data 4. Other Please specify.............................................................
5. Don’t use this data repository for decision-making
Cardiac surgery (CARPIA) unit data repository 1. Direct data access or integration from Cardiac Surgery data repository 2. Contact data custodian to collect specific data from ICU unit 3. Contact IT department to collect specific data 4. Other
Please specify............................................................. 5. Don’t use this data repository for decision-making
Quality & Safety eDS Summary data repository
1. Direct data access or integration from eDS data repository 2. Contact data custodian to collect specific data from Quality & Safety unit 3. Contact IT department to collect specific data 4. Other 5. Please specify............................................................. 6. Don’t use this data repository for decision-making
Clinical Costings Transition II (Finance) data repository
1. Direct data access or integration from Clinical Costings data repository 2. Contact data custodian to collect specific data from Clinical Costings unit 3. Contact IT department to collect specific data 4. Other 5. Please specify............................................................. 6. Don’t use this data repository for decision-making
96
96 Appendices
5. Identify example management problems/decisions you address or would like to address by using the other data repositories listed? (ICU unit/ Cardiac Surgical/e-discharge summary unit, Transition II)
Data repositories (Tick the data
repositories) Problems/Decisions/Analysis I would like to address Which routine analysis do you
conduct or would like to conduct
ICU database
Cardiac surgical (CARPIA)
e- Discharge summary
Clinical Costings (Transition II)
Daily Monthly Quarterly Yearly Other ………… I don’t know
ICU database
Cardiac surgical (CARPIA)
e- Discharge summary
Clinical Costings (Transition II)
Daily Monthly Quarterly Yearly Other ………… I don’t know
ICU database
Cardiac surgical (CARPIA)
e- Discharge summary
Clinical Costings (Transition II)
Daily Monthly Quarterly Yearly Other ………… I don’t know
Appendices 97
Current Issues: 6. Are you satisfied with the support provided for decision-making processes by the
current Information Systems? (Please tick check box) Yes No
Comment:
7. What are the main information related problems you have identified in the
decision-making process supporting clinical service management in your area? (Please tick check box – You can select more than one answer)
Lack of quality data Limited accessibility and availability of data from other repositories Integration of data from other repositories Difficulty of getting historical data Lack in efficient reporting tools Lack of time or resources to undertake analysis Other (Please specify)…………………………………………………
I don’t know 8. What are the main data quality issues impacting the trust in clinical data used for
the decision-making processes in your area? (Please tick check box – You can select more than one answer)
Lack in data completeness (data not missing by record or by field values) Lack in accurate accuracy (correct data)
Lack in accurate consistency/compatability (reasonablness with other or previous data eg by definitions, format)
Lack in granularity/precision (correct detail) Lack in validity and reliability (data performs intended function within required/defined specifications)
Lack in relevance (data applicable/ helpful to task at hand) Lack in data consistency (data compatibility or reasonableness with other or previous data eg relates to definitions, formats, standards)
Lack in data timeliness (currency of data) Other (Please specify)…………………………………….. I don’t know Data Storage/ data analysis: 9. Do these data repositories (ICU unit/ Cardiac surgical/e-discharge summary
unit/Transition II) store sufficient data fields for your decision-making processes?
……………………………………………………………… 10. According to your knowledge, how long is data kept in the data repositories?
98
98 Appendices
.................................................................................................. 11. According to your knowledge, what analysis tools do you use to analyse the clinical data? …. ….……………………………………………………… 12. Do you have any concerns regarding data security and information privacy that should be incorporated in the application development? ………………………………………………………………….
99
__________________________________________________________________________ If you would agree to participate for a face to face interview, for further clarification of information requirements for data warehouse prototype development please tick the following box and provide your contact details. Yes, I would like to participate Name: …………………………………………….. Position at TPCH: …………………........................ Contact number: ……………………………….... Email: ……………………………………………. Signature ………………………… Date…………………………………….
100
100 Appendices
APPENDIX B: DESIGN OF DATA WAREHOUSE FACT AND DIMENSION TABLES
101
TPCH Cardiac Surgery DW Architecture
Sample TPCH jobs Process of extract, transfer and load
data from the source tables
Sample TPCH source data Source data tables load as external file from CARPIA, ICU and Transition II
Sample TPCH target data Includes Fact and Dimension tables
102
102 Appendices
Extract, Transform and load valid data to the dimension and fact tables Populate Cost fact table
Populate Risk score fact table
103
Populate DRG Dimension table
ICU Diagnosis Dimension
104
104 Appendices
Populate Patient cases Dimension
Populate Patient Dimension
105
Invalid data handling CARPIA data validation Missing values (URNumber, PREDMORT,OpDate) and Custom validation (Bleeding complication).
106
106 Appendices
Finance data validation Missing values (URNUM, CostTOTALACT)