IMPACT OF A DATA WAREHOUSE MODEL FOR IMPROVED DECISION ... · PDF fileimpact of a data warehouse model for improved decision-making ... improved decision-making process in healthcare

IMPACT OF A DATA WAREHOUSE MODEL FOR IMPROVED DECISION-MAKING

PROCESS IN HEALTHCARE

Pubudika Kumari Mawilmada BBus (IT Management), MIT

Submitted in fulfilment of the requirements for the degree of

Master of Information Technology (Research)

Computer Science Discipline

Faculty of Science and Technology

Queensland University of Technology

October 2011

i

impact of a data warehouse model for improved decision-making process in healthcare i

Keywords

Cardiology, Clinical Decision Support Systems, Data marts, Data warehouse, Decision-making, Information systems, Healthcare, Star schema, Snow flakes schema.

ii

ii impact of a data warehouse model for improved decision-making process in healthcare

Abstract

The health system is one sector dealing with a deluge of complex data. Many

healthcare organisations struggle to utilise these volumes of health data effectively

and efficiently. Also, there are many healthcare organisations, which still have stand-

alone systems, not integrated for management of information and decision-making.

This shows, there is a need for an effective system to capture, collate and distribute

this health data. Therefore, implementing the data warehouse concept in healthcare is

potentially one of the solutions to integrate health data. Data warehousing has been

used to support business intelligence and decision-making in many other sectors such

as the engineering, defence and retail sectors.

The research problem that is going to be addressed is, “how can data

warehousing assist the decision-making process in healthcare”. To address this

problem the researcher has narrowed an investigation focusing on a cardiac surgery

unit. This research used the cardiac surgery unit at the Prince Charles Hospital

(TPCH) as the case study. The cardiac surgery unit at TPCH uses a stand-alone

database of patient clinical data, which supports clinical audit, service management

and research functions. However, much of the time, the interaction between the

cardiac surgery unit information system with other units is minimal. There is a

limited and basic two-way interaction with other clinical and administrative

databases at TPCH which support decision-making processes. The aims of this

research are to investigate what decision-making issues are faced by the healthcare

professionals with the current information systems and how decision-making might

be improved within this healthcare setting by implementing an aligned data

warehouse model or models. As a part of the research the researcher will propose and

develop a suitable data warehouse prototype based on the cardiac surgery unit needs

and integrating the Intensive Care Unit database, Clinical Costing unit database

(Transition II) and Quality and Safety unit database [electronic discharge summary

(e-DS)]. The goal is to improve the current decision-making processes. The main

objectives of this research are to improve access to integrated clinical and financial

data, providing potentially better information for decision-making for both improved

iii

impact of a data warehouse model for improved decision-making process in healthcare iii

management and patient care and also, providing greater efficiency in supporting

current similar processes.

The methodology used to conduct this research consisted of five stages. The

first stage reviewed the literature to define the background knowledge about data

warehousing, identify different data warehouse models, factors leading to model

selection and application of the data warehouse concept in the healthcare

environment. In the second stage of the methodology, a survey was conducted to

gather information on the current data repositories, current decision-making process,

current decision-making issues and data warehouse prototype development

requirements. The main survey methods used were questionnaire and unstructured

interviews. A total of ten questionnaires were distributed to stakeholders in the

cardiac surgical decision-making processes. The questionnaire consisted of twelve

questions producing data for four categories of inquiry namely: current data

repositories, decision-making process, current issues, data storage and analysis needs.

An 80% response rate was achieved (8 out of 10). Although 30% (3 of 10) did not

wish to participate further 70% (7 of 10) contributed to subsequent unstructured

interviews used to clarify and extend survey results. These were analysed

thematically and a number of decision-making knowledge gaps ascertained. The

survey and literature review data were then integrated to select a model. Thirdly, the

model prototype was developed and fourthly the integrated data was analysed and

information products created. Finally, the information products were reviewed by the

hospital staff and feedback obtained to evaluate the warehouse prototype utility.

According to the survey conducted in this research it is apparent that end users

(clinicians, unit manager, data managers from cardiac surgery, ICU, quality and

safety and clinical costing units) have limited access to data repositories other than

their own database. For instance, most of the time clinicians or unit managers have to

contact data custodians to extract and collate the information from other data

repositories. Also, then they have to manually integrate data prior to analysis and

reporting. This leads to limitations in the interaction between ICU, cardiac surgery

(CARPIA), quality and safety (e-DS), and clinical costing units databases.

All these issues create inefficiencies in the decision-making process. After

analysis of further data from the questionnaire, the user requirements were

summarised for the data warehouse prototype development. Using analysed results

iv

iv impact of a data warehouse model for improved decision-making process in healthcare

from the questionnaire and by referring to the literature, the results indicate a

centralised data warehouse model for the cardiac surgery unit at this stage. A

centralised data warehouse model addresses current needs and can also be upgraded

to an enterprise wide warehouse model or federated data warehouse model as

discussed in the many consulted publications. The data warehouse prototype was able

to be developed using SAS enterprise data integration studio 4.2 and the data was

analysed using SAS enterprise edition 4.3. In the final stage, the data warehouse

prototype was evaluated by collecting feedback from the end users. This was

achieved by using output created from the data warehouse prototype as examples of

the data desired and possible in a data warehouse environment. According to the

feedback collected from the end users, implementation of a data warehouse was seen

to be a useful tool to inform management options, provide a more complete

representation of factors related to a decision scenario and potentially reduce

information product development time.

However, there are many constraints exist in this research. For example the

technical issues such as data incompatibilities, integration of the cardiac surgery

database and e-DS database servers and also, Queensland Health information

restrictions (Queensland Health information related policies, patient data

confidentiality and ethics requirements), limited availability of support from IT

technical staff and time restrictions. These factors have influenced the process for the

warehouse model development, necessitating an incremental approach. This

highlights the presence of many practical barriers to data warehousing and integration

at the clinical service level. Limitations included the use of a small convenience

sample of survey respondents, and a single site case report study design.

As mentioned previously, the proposed data warehouse is a prototype and was

developed using only four database repositories. Despite this constraint, the research

demonstrates that by implementing a data warehouse at the service level, decision-

making is supported and data quality issues related to access and availability can be

reduced, providing many benefits. Output reports produced from the data warehouse

prototype demonstrated usefulness for the improvement of decision-making in the

management of clinical services, and quality and safety monitoring for better clinical

care. However, in the future, the centralised model selected can be upgraded to an

enterprise wide architecture by integrating with additional hospital units’ databases.

v

impact of a data warehouse model for improved decision-making process in healthcare v

Table of Contents

Keywords .................................................................................................................................................i

Abstract .................................................................................................................................................. ii

Table of Contents .................................................................................................................................... v

List of Figures ...................................................................................................................................... vii

List of Tables....................................................................................................................................... viii

List of Abbreviations .............................................................................................................................. ix

Statement of Original Authorship ............................................................................................................ x

Acknowledgments .................................................................................................................................. xi

Dedication ............................................................................................................................................ xii

CHAPTER 1: INTRODUCTION ........................................................................................................ 1

1.1 Research background ................................................................................................................... 1

1.2 Problem ........................................................................................................................................ 2

1.3 Research questions ....................................................................................................................... 3

1.4 Significance, Scope and Definitions ............................................................................................ 4

1.5 Thesis outline ............................................................................................................................... 5

CHAPTER 2: LITERATURE REVIEW ............................................................................................ 7

2.1 Review methodology .................................................................................................................... 7 2.1.1 Literature search sources ................................................................................................... 7 2.1.2 Information search strategies ............................................................................................ 9

2.2 Background theory ....................................................................................................................... 9 2.2.1 The data warehouse concept ............................................................................................ 9 2.2.2 Main components of the data warehouse ........................................................................ 10 2.2.3 Data warehouse modelling .............................................................................................. 12 2.2.4 Data warehouse methodologies ...................................................................................... 14 2.2.5 Data warehouse lifecycle ................................................................................................ 16 2.2.6 Operational systems vs data warehouses......................................................................... 17 2.2.7 Data marts ....................................................................................................................... 18

2.3 Different types of data warehouse models ................................................................................. 19 2.3.1 Centralised data warehouse ............................................................................................. 19 2.3.2 Independent data marts ................................................................................................... 19 2.3.3 Federated architecture .................................................................................................... 19 2.3.4 Hub and spoke architecture ............................................................................................. 20 2.3.5 Data mart bus architecture .............................................................................................. 20

2.4 Data warehouse architecture/model selection factors ................................................................. 21

2.5 Health information management ................................................................................................ 24 2.5.1 Healthcare decision-making ............................................................................................ 24 2.5.2 Healthcare information systems and decision-making .................................................... 25

2.6 Data warehousing and healthcare ............................................................................................... 29 2.6.1 Data warehouse implementation examples ..................................................................... 30 2.6.2 Data waehouse implementation challenges ..................................................................... 34

2.7 Summary and implications ......................................................................................................... 36

vi

vi impact of a data warehouse model for improved decision-making process in healthcare

CHAPTER 3: RESEARCH DESIGN ............................................................................................... 39

3.1 Methodology and Research Design............................................................................................ 39 3.1.1 Methodology ................................................................................................................... 39 3.1.2 Research Design ............................................................................................................. 42

3.2 Participants ................................................................................................................................ 43

3.3 Instruments ................................................................................................................................. 43

3.4 Procedure and Timeline ............................................................................................................. 44

3.5 Analysis ..................................................................................................................................... 45

3.6 Ethics and Limitations ............................................................................................................... 45

3.7 Interlectual Property Rights ....................................................................................................... 46

3.8 Health and safety ........................................................................................................................ 46

CHAPTER 4: RESULTS ANALYSIS .............................................................................................. 47

4.1 Current decision-making process ............................................................................................... 47

4.2 Decision-making issues .............................................................................................................. 48

4.3 Application development requirements analysis ....................................................................... 51

CHAPTER 5: DATA WAREHOUSE PROTOTYPE DEVELOPMENT ..................................... 55

5.1 Business intelligence tools ......................................................................................................... 55 5.1.1. SAS/Warehouse Administrator 4.3 ................................................................................. 56 5.1.2 SAS data integration studio ............................................................................................ 57

5.2 Data analysis tools ..................................................................................................................... 58 5.2.2 SAS enterprise guide ...................................................................................................... 58

5.3 Cardiac surgery data warehouse prototype selection and development ..................................... 58 5.3.1 Model selection Rationale .............................................................................................. 58 5.3.2 Development process ...................................................................................................... 63

5.4 Data analysis using the data warehouse prototype ..................................................................... 68

5.5 Data warehouse prototype evaluation ........................................................................................ 74

CHAPTER 6: DISCUSSION ............................................................................................................. 77

6.1 Limitations of the study ............................................................................................................. 81

CHAPTER 7: CONCLUSION .......................................................................................................... 85

7.1 Recommendations and future directions .................................................................................... 86

BIBLIOGRAPHY ............................................................................................................................... 87

APPENDICES ....................................................................................................................................93 Appendix A: Questionnaire ................................................................................................................... 93 Appendix B: Design of data warehouse fact and dimension tables ..................................................... 100

vii

impact of a data warehouse model for improved decision-making process in healthcare vii

List of Figures

Figure 1: Components of the data warehouse ........................................................................................ 11

Figure 2: Multidimensional data ........................................................................................................... 12

Figure 3: Star schema data model ........................................................................................................ 13

Figure 4: Snowflakes schema data model .............................................................................................. 14

Figure5: Data warehouse system life cycle............................................................................................ 16

Figure 6: Data warehouse architectural types ....................................................................................... 20

Figure 7: Different types of data warehouse architectures .................................................................... 21

Figure 8: Results of the survey ............................................................................................................. 22

Figure 9: The distribution of the architectures ..................................................................................... 22

Figure 10: Research model for data warehouse architecture selection ................................................. 23

Figure 11: An integrated model for data warehouse architecture selection .......................................... 23

Figure 12: Decision-making levels within an organisation ................................................................... 25

Figure 13: Timelining Health Information Systems Evaluation ........................................................... 27

Figure 14: Advantages and disadvantages of data integration architectures ........................................ 28

Figure 15: Current use of BI\CI by healthcare organisations ................................................................ 30

Figure 16: Top 3 barriers to the use of business/clinical intelligence applications ............................... 35

Figure 17: Top 3 IT challenges to implementing/deploying business intelligence applications ........... 36

Figure 18: Current support from the IS’s for decision making .............................................................. 48

Figure 19: Decision-making issues with current IS’s ............................................................................ 50

Figure 20: Data quality issues in current decision-making process ....................................................... 51

Figure 21: Security and privacy concerns for DW prototype development ........................................... 53

Figure 22: VHA corporate data warehouse visual architecture ............................................................. 59

Figure 23: Medical federated data warehouse model ........................................................................... 60

Figure 24: CDW architecture for traditional Chinese medicine ........................................................... 61

Figure 25: Proposed data warehouse model for the TPCH Cardiac surgery unit .................................. 63

Figure 26: Risk score star schema ......................................................................................................... 66

Figure 27: Cost star schema .................................................................................................................. 67

Figure 28: Cardiac Surgery unit data warehouse model ........................................................................ 68

Figure 29: Comparison of risk scores –group by PREDMORT ............................................................ 69

Figure 30: Interaction of risk scores ...................................................................................................... 69

Figure 31: The actual expenditure per episode of care according to the certain clinical group ............. 70

Figure 32: Cost of reoperation for bleeding as an example of post operational complications ............ 71

Figure 33: Costs associated with the DRG’s- according to cardiac surgery unit admission status ....... 72

viii

viii impact of a data warehouse model for improved decision-making process in healthcare

List of Tables

Table 1: Literature search sources ........................................................................................................... 8

Table 2: Comparison of data warehouse with OLTP systems .............................................................. 17

Table 3: Differences between data mart and data warehouse ................................................................ 18

Table 4: Combined reasons for data warehouse failure ......................................................................... 34

Table 5: Methodology stages ................................................................................................................ 39

Table 6: Decisions/ Problems would like to address by end users ........................................................ 52

Table 7: Dimension Tables ................................................................................................................... 65

ix

impact of a data warehouse model for improved decision-making process in healthcare ix

List of Abbreviations

BI Business Intelligence

CDSS Clinical Decision Support Systems

CI Clinical Intelligence

CIO Cheif Information Officer

CMS Center for Medicare and Medicaid services

DM Data Marts

DW Data Warehouse

e-DS Electronic Discharge Summary

FED Federated Data warehouse

HBCIS Hospital Based Corporate Information System

ICU Intensive Care Unit

IDM Independent Data Marts

IS Information Systems

IT Information Technology

ITI Information Technology Infrastructure

OIPT Organizational Information Processing

Theories

OLAP Online Analytical Processing

OLTP Online Transaction Processing

RCT Randomised Controlled Trials

TPCH The Prince Charles Hospital

VHA Veteran’s Health Administration

x

x impact of a data warehouse model for improved decision-making process in healthcare

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet

requirements for an award at this or any other higher education institution. To the

best of my knowledge and belief, the thesis contains no material previously published

or written by another person except where due reference is made.

Signature: _________________________

Date: _________________________

xi

impact of a data warehouse model for improved decision-making process in healthcare xi

Acknowledgments

I would like to thank my principal supervisor Dr. Tony Sahama for guidance

and support given to me in conduct this research. Also, I would like to thank my

associate supervisor Craig Huxley for his guidance and advice. I would like to

acknowledge my associate supervisor Susan Smith for advice and support given

throughout this project. I appreciated her assistance and guidance provided to me and

being patient in answering my questions. To the Prince Charles Hospital cardiac

surgery unit data managers Gai Harris, Lesley Drake, Kay Watson Clinical costing

manager Allan Rowe, Senior costing officer Diana Lal, Applications infrastructure

manager Brad Day and ICU health information manager Lynette Munck; I would like

to thank you for support in many ways for my research project.

xii

xii impact of a data warehouse model for improved decision-making process in healthcare

Dedication

This thesis is dedicated to my parents

U.B. Mawilmada and N.K. Mawilmada

who have supported me all the way since the beginning of my studies.

Chapter 1: Introduction 1

Chapter 1: Introduction

This chapter outlines the background (section 1.1) and the research problem to

be addressed by the research project (section 1.2) and research questions (section1.3).

Section 1.4 describes the significance and scope of this research. Finally, section 1.5

includes an outline of the remaining chapters of the thesis.

1.1 RESEARCH BACKGROUND

“Healthcare is an information intensive business generating huge volumes of

data from hospitals, primary care surgeries, clinics and laboratories” (Grimson,

Grimson, & Hasselbring, 2000, p. 49). According to Sahama and Croll (2007), data

acquisition and distribution of information create a challenging situation for people

engaged in the medical sector. Information Technology (IT) today plays a major role

in healthcare through the introduction of systems such as electronic health records

and telemedicine for example. Integration of stand-alone systems would benefit

health organisations. However, there are many healthcare organisations which still

have stand-alone Information Systems (IS) (de Mul, Alons, van der Velde, Konings,

Bakker, & Hazelzet, 2010). Integrating stand–alone systems will become a more

complex task as stored data is increasingly used for decision-making in clinical care,

quality assurance, research and management (de Mul et al., 2010). Jani, Davis and

Fox (2007) stated that, although there are recent advances in database developments

their impact is limited because there are limited opportunities to link these databases.

Although many clinical ISs have been designed or are available, most benefit the area

of hands on care for individual patients in transactional systems rather than

supporting the analyses of data (de Mul et al., 2010; Sanders & Protti, 2008). As

stated by Albert, Walter, Arnrich, Hassanein, Rosendahl, Bauer and Ennker (2004, p.

312), “Clinicians are encouraged to improve their methods of investigation and

analysis of outcomes, which still tend to be underdeveloped in comparison to

methods available in industry”.

2

2 Chapter 1: Introduction

1.2 PROBLEM

The problem that is going to be addressed is, “how does data warehousing

assist the decision-making process in healthcare”. To address this problem narrowed

the scope of the research to an investigation focusing on a cardiac surgical unit.

Arigon (2007) describes that, data used in cardiac surgery consists of alphanumeric

data, images and signals. These may come from a number of data repositories. The

analysis environment of such data must include processing methods in order to

compute or extract the knowledge embedded in the raw data (Arigon et al., 2007). A

data warehouse is a potential solution which may provide a better environment for

the analysis purposes of these data.

All clinical care units are accountable for providing quality of care. There have

been many models of quality measures developed (de Mul et al., 2010). However,

sometimes these require complex queries to analyse data and it is a time consuming

process. Moreover, as stated by Albert et al (2004), often predictive models cannot

consider all patients characteristics, and do not include non patient related factors.

Therefore, there is a need for a system to analyse cardiac data from different

perspectives. However, most of the time cardiac information systems such as those

for cardiac surgery have minimal interaction with other units. By combining the

cardiac surgery unit data repository with clinical units such as the Intensive Care unit

(ICU), anaesthesia and financial units, clinicians could gain more benefits. The

implementation of a data warehouse concept is one potential solution to efficiently

facilitate easy analysis of data (de Mul et al., 2010).

Finally, although most clinicians believe that the use of the data warehouse

concept in cardiac surgery unit can lead to efficient decision-making, high quality of

patient care and safer processes, only a small proportion of this technology has been

adopted (de Mul et al., 2010).

This research has used the cardiac surgery unit at the Prince Charles Hospital

(TPCH) as a case study. The cardiac surgery program at the Prince Charles Hospital

uses a stand-alone database of patient clinical data, which supports Clinical Audit,

Service Management and Research functions. There is a limited two way interaction

with other clinical and administrative databases at TPCH to support these decision-

making processes. This research aims to propose a suitable data warehouse model for

the cardiac surgery unit at TPCH, in order to improve the decision-making process.

3


The main databases employed to develop a data warehouse prototype are the cardiac

surgery register database (CARPIA), the ICU database, a quality and safety unit

database and the enterprise clinical costing unit database. The cardiac surgery register

database stores cardiac surgical patients’ demographics data, patients history,

preoperative data, procedural (surgical) data, post-operative outcomes data, test

results, diagnosis, risk scores and so on. The data for this database is derived from

several sources; however most data are collected and entered manually into the

system by trained clinical data managers. Some basic patients’ information is derived

from the Hospital Based Corporate Information System (HBCIS) which is the enterprise

hospital patient administration system. Also, the pathology system and main theatre

information system provide information for the CARPIA database. The Quality and

Safety unit database of interest is known as the electronic Discharge Summary

database (e-DS). This database contains hospital wide discharge summaries of all

patients; It is a small transactional database deriving information from HBCIS and

clinician entry.

The Clinical Costing unit already employs the State level enterprise data

warehouse known as Transition II. The main data sources for this database are

HBCIS, and the other management feeder systems such as Emergency Department

Information System (EDIS), Operating Room Management Information System

(ORMIS), Enterprise pathology results information system (Auslab) and Trendcare

system (patient-nurse dependency). The Transition II database manages data in three

levels: the financial level, departmental level and the patient level, although little

actual clinical data are captured. The ICU database contains data of patients admitted

to the ICU. Manually entered data are the main source of information for this

database and include patient clinical data such as morbidity scores, risk scores

procedural data and physiological measurements.

1.3 RESEARCH QUESTIONS

One of the main aims of this research is to develop background knowledge of

data warehousing and its application to healthcare. Data warehousing plays a major

role in businesses today in contributing to improved decision-making. As in other

businesses, the data warehouse concept is also becoming popular in the healthcare

industry as making appropriate well informed decisions is the basis of effective

4

4 Chapter 1: Introduction

healthcare, which will lead to improvements in the quality of service and reduce the

costs in healthcare. However, there are still many healthcare organisations which

have disparate information systems that are not integrated and do not support

improved decision-making processes. Therefore, it is important to identify those

issues with the current information systems relating to the impediment of better

decision-making and to the potential. Hence, the first question asked would be:

“What decision-making issues exist or are faced by healthcare professionals with the

current information systems?”

There are different alternatives of data warehouse architecture available which

support various decision-making structures and purposes. Therefore, it is important

to consider selection of a suitable data warehouse model, which will facilitate quality

decisions in the Cardiac Surgical context. This will be the key to the next question:

“How might decision-making be improved within healthcare services by

implementing a more aligned data warehousing model or models?”

This research will, develop a suitable data warehouse model for the Cardiac

surgery unit at The Prince Charles Hospital, in order to improve decision-making

processes.

1.4 SIGNIFICANCE, SCOPE AND DEFINITIONS

This research presents four different outcomes. As discussed above, a data

warehouse prototype will be developed for the Cardiac surgery unit at the Prince

Charles Hospital. This will:

• improve access to administrative, financial and clinical information.

• potentially improve decision-making for the management of the clinical

services.

• potentially improve quality and safety monitoring to assist healthcare accountability

and better clinical care .

• provide data for clinical effectiveness and evaluation research.

5


1.5 THESIS OUTLINE

Chapter 1 provides details about the research background, research problem, its

purpose and outcomes of the research. Four outcomes are highlighted as part of the

completion of this project

Chapter 2 presents the review of literature on data warehousing, including

different data warehouse architectural types, how data warehouse is different from

operational systems and data marts, data warehouse modelling, and data warehouse

model selection factors. Furthermore, this chapter provides details on healthcare

information management, healthcare decision-making issues and application of the data

warehouse concept in healthcare with some examples.

Chapter 3 describes the research design of this research project. It covers research

methodology, research design, participants and instruments used in the research. The

research methodology consists of five stages and each stage is explained in detail.

Chapter 4 presents the analysis of the survey findings. It covers the current

decision-making process, issues related to the current decision-making process and also

identifies the user requirements for data warehouse prototype development.

Chapter 5 presents the cardiac surgery data warehouse prototype development.

Firstly, it briefly describes the business intelligence tools used to develop the data

warehouse prototype and the benefits of those tools. The next section, explains the data

warehouse development steps using the SAS data integration studio 4.2 software.

Chapter 6 provides a discussion of survey results analysis. This chapter contains a

full discussion and evaluation of the results with reference to the literature and the

limitations.

Chapter 7 concludes the thesis by providing information on the research process,

its benefits to TPCH cardiac surgical unit, constraints and limitations faced during the

project and recommendations and future directions.

Chapter 2: Literature Review 7

Chapter 2: Literature Review

This chapter reviews the literature on the following topics: The first section

gives a brief introduction of review methodology (2.1). This covers the literature

search sources (2.1.1) and information search strategies (2.1.2). The second section

(2.2) discusses the background theory of data warehousing in general and gives

detailed information about data warehouse components, data warehouse modelling,

data marts and how data warehouses differ from operational systems. The third

section (2.3) discusses different types of data warehouse models and selecting factors

and the issues related to data warehouse selection. The next section (2.4) identified

the data warehouse model selection factors. Fifth section (2.5) reviews the literature

on health information management. This will covers decision-making and issues

related to healthcare and healthcare information systems. The following section (2.6)

discusses the data warehouse concept in healthcare and some of the real examples of

data warehouse implementation and its benefits. The section 2.7, studies the

implications from the literature and develops a framework for the research.

2.1 REVIEW METHODOLOGY

2.1.1 LITERATURE SEARCH SOURCES

Many information sources were used to search the literature widely. The

primary literature search sources used were publisher databases. The publisher

databases provide information from many formal sources such as journal articles,

research papers, and conference papers. They also provide a major source of

traditional academic information. Most of the information sources from the

Queensland University of Technology (QUT) library are stored as books, magazines

and e-books. Moreover, the general web search engines such as Google Scholar,

Google, Scirus and Inforpeople provide important e-books, peer reviewed articles as

well as non-peer reviewed industry and ‘grey’ literature that is related to the research

field. The following table shows the information sources that were used.

8

8 Chapter 2: Literature Review

Search

material

Source type Main information source

Journal articles

and conference

papers

Databases ScienceDirect

Web of Science

ACM portal

SpringerLink

ProQuest

IEEE Xplore

CiteSeerX

EBSCO host

Elsevier

JAMA

Books Libraries Queensland University of Technology

Online providers Google books

Web sites

Case studies

Australian

Digital Thesis

Web search engines

www.google.com

http://scholar.google.com.

www.scirus.com

www.infopeople.org

http://au.search.yahoo.com

http://www.bing.com

http://au.altavista.com

http://www.webwombat.com.

http://www.dwinfocenter.org/getstart.html

Web groups Web search engines http://www.technologyreview.com/blog/

http://blog.kalido.com/

http://tdwi.org/

http://www.information-management.com/

http://www.sas.com/

http://www.bi-dw.info/

http://www.dwaa.org.au/layout-8.html

Table 1: Literature search sources

http://www.google.com/

http://scholar.google.com/

http://www.scirus.com/

http://www.infopeople.org/

http://au.search.yahoo.com/

http://www.bing.com/

http://au.altavista.com/

http://www.webwombat.com/

http://www.dwinfocenter.org/getstart.html

http://www.technologyreview.com/blog/

http://blog.kalido.com/

http://tdwi.org/

http://www.information-management.com/

http://www.sas.com/

http://www.bi-dw.info/

http://www.dwaa.org.au/layout-8.html


2.1.2 INFORMATION SEARCH STRATEGIES

Many strategies were used to search widely for information related to the

research topic and research questions. The search terms “data warehouse”, “data

warehousing”, “data integration” were used to find the basic articles about the data

warehouses. These searches returned a number of articles. And the next step to

combine the initial terms with other terms such as “healthcare”, “decision-making”,

“models” etc. to narrow down the search. Search strategies included the use of

boolean operators, use of proximity operators such as 1W/nn (ScienceDirect,

ProQuest) and Near operator (N) in EBSCO Host etc. which helped to narrow down

the search results. Abstracts were reviewed and if certain criteria (e.g. related to the

research questions) were identified in the abstract then the full paper was included in

the literature review. Citation indexes were used to search for related publications.

Also, citation indexes helped to identify the latest research trends and helped to

obtain the broadest approach to addressing the research topic. Moreover, the citation

indexes were useful in gathering information about authors, journal articles and

specialised areas of publications.

2.2 BACKGROUND THEORY

2.2.1 THE DATA WAREHOUSE CONCEPT

Data warehousing technology aims to structure the data in a appropriate way

to access the data, and use it in an efficient and effective manner (Dias, Tait,

Menolli, & Pacheco, 2008). As stated by Kerkri, Quantin, Allaert, Cottin, Charve,

Jouanot and Yétongnon (2001), the data warehouse is responsible for the

consistency of information. The integration of tools such as query tools, reporting

tools and analysis tools provide opportunity to handle the coherence of information.

The aim of data warehousing is to organise the gathering of a wide range of data and

store it in a single repository (Kerkri et al., 2001). Currently, data warehousing plays

a major role in the business community at large. It is also relevant to healthcare as

mentioned in del Hoyo-Barbolla and Lees (2002, p. 43), “in a competitive climate, if

healthcare organisations are to keep their customers, knowing and managing

information about them is essential and organisations realized that it is crucial to

access viable and timely data.” Furthermore, integrating data from the different

1W/nn - W represents "within", and nn represents the maximum number of words between the terms.

10


sources and converting them into valuable information is a way to obtain

competitive advantage (del Hoyo-Barbolla & Lees, 2002).

Data warehousing is “a collection of decision support technologies aimed at

enabling the knowledge worker (executive, manager, analyst) to make better and

faster decisions” (Chaudhuri & Dayal, 1997, p. 1). According to Inmon (2005, p.

29) data warehouse is a “subject-oriented, integrated, time-variant and non-volatile

collection of data in support of management decisions”. March and Hevner (2007)

argued that the three components of intelligence namely understanding, adaptability

and profiting from experience are important considerations when designing the data

warehouse. Also, these authors mentioned that the data warehouse should allow

managers to gather information such as identifying and understanding different

situations and the reasons for their occurrence. Further, they have argued that the,

data warehouse should “enable a manager to locate and apply the relevant

organizational knowledge and to predict and measure the impact of decision over

time” (March & Hevener, 2007, p.1035). However, as mentioned by March and

Hevner (2007), these arguments forms the challenges that need to be considered

when implementing a data warehouse.

2.2.2 MAIN COMPONENTS OF THE DATA WAREHOUSE

According to Kimball and Ross (2002), a few components can be identified to

form the data warehouse environment (Figure 1). Each component of the data

warehouse provides a specific function. The main components are,

• Operational source system

• Data Staging Area

• Data Presentation Area

• Data Access Tools

Operational Source Systems

The Operational source system is mainly concerned about processing

performance and availability. Generally, the source system maintains a small amount

of historical data. The queries designed against source systems are narrow. On the

other hand, one-record-at-a-time queries which operate as part of the normal

transaction flow and act according to the demands on the operational system

(Kimball & Ross, 2002).


Figure 1: Components of the data warehouse (Kimball & Ross, 2002, p. 7)

Data Staging Area

The data staging area is the place that keeps the data as temporary storage

(Kimball & Ross, 2002). Also, this area is known as the Extract Transformation Load

(ETL) because it is conducting the data extraction, transformation and loading. In

other words, the data staging area can be referred to as everything between the

operational source systems and the data presentation area (Kimball & Ross, 2002).

The first process of transferring data to the data warehouse is extraction. During this

process it is important to read and understand the source data and copy them to the

staging area of the data warehouse for further management. After extracting the data

to the staging area many alterations such as cleansing the data (correcting

misspellings, resolving domain conflicts, dealing with missing elements, or parsing

into standard formats), combining data from multiple sources, deduplicating data,

and assigning warehouse keys take place (Kimball & Ross, 2002). Then the load the

data to the presentation area of the data warehouse (Kimball & Ross, 2002).

Data Presentation

The data presentation area is the place where data is organized, stored, and

made available to the users. In addition, the data presentation area is the place where

business communities see data and gain access using data access tools. As stated by

Kimball and Ross (2002), this area can be referred as series of integrated data marts.

A each of this data mart presents the data from a single business process (Kimball &

Ross, 2002).

12


Data access Tools

The data access tools element is the final element of the data warehouse. This

element provides many capabilities for the business users to control the presentation

area for analytic decision-making. Generally, the data access tool can act as a simple

query tool or can be complex as a data mining application (Kimball & Ross, 2002).

2.2.3 DATA WAREHOUSE MODELLING

Generally Data warehouse modelling is used to,

• Identify the data warehouse, data mart, and decision support system data and

information requirements

• Represent the data warehouse view

• Design the data warehouse schema according to the information

requirements. (Borysowich, 2007)

In the data warehouse, after the business queries and subject area have been

identified the information stored in the data warehouse/data mart is designed

(Borysowich, 2007). Designing the data warehouse/data mart structure is different

from designing the operational systems. According to Mohania, Samtani, Roddick

and Kambayashi (1999), operational systems consist of simple pre-defined queries.

On the other hand, in data warehousing environments queries join with more tables

and more computation time and informality (Mohania et al., 1999). This leads to an

emergence of a new view of data modeling design. As a result of this, the multi-

dimensional or data cube has become the suitable data model for the data

warehousing environment. As stated by Chaudhri and Dayal (1997), a

multidimensional view of the data is important when designing front end tools,

database design and query engines for online analytical processing (OLAP).

Figure 2: Multidimensional data (Chaudhuri & Dayal, 1997, p. 4)


As stated by Ramakrishnan and Gehrke (as cited in Tan, 2006, p.876) “ Online

analytical processing (OLAP) is a term that describes a technology that uses a

multidimensional view of aggregate data to provide quick access to strategic

information for the purposes of advanced analysis”. Generally, OLAP supports

queries and data analysis by collecting, managing and processing multidimensional

data (Tan, 2006). In multidimensional data modeling, data is stored as facts and

dimensions. Facts can be numerical or factual data and can represent the activity

which is specific to the business. On the other hand, “a dimension represents a single

perspective of the data” (Mohania et al., 1999, p. 44) and attributes of the dimension

characterises each dimension. For instance a customer dimension can consist of the

name of the customer, address, and the city. Figure 2 shows the multidimensional

data view. Two modeling techniques named star schema or snowflakes schema are

used to represent multidimensional data.

Star schema

The star schema modelling consists of a central table (fact table) and other

tables which directly link to it. These tables are known as dimension tables.

According to Chaudhuri and Dayal (1997), star schema is used in most data

warehouses to represent the multidimensional data model.

Figure 3: Star schema data model (ExecutionMih, 2010, p. 2)

In general, the fact table contains the keys and measurements. For example

when referring to the Figure 3 sales fact table, it can be seen to contain keys such as

time_key, Item_key, branch_key and location _key and measures such as units_sold,

14


dollars_sold and avg_sales. In addition, the dimension tables are related to the sales

fact table by time, branch item and location fields. Each of these dimension tables

contains the attributes related to each dimension (ExecutionMih, 2010).

Snowflakes schema

The snowflakes schema is a more complex data warehouse model than the star

schema. Like the star schema the snowflakes schema also consists of fact tables and

dimension tables. However, the snowflakes schema dimension tables are normalised

and linked to another dimension table (Chaudhuri & Dayal, 1997).

Figure 4: Snowflakes schema data model (ExecutionMih, 2010, p. 2)

2.2.4 DATA WAREHOUSE METHODOLOGIES

There are main two basic methodological approaches for data warehouse

design. These are the top- down approach and the bottom-up approach (Golfarelli &

Rizzi, 2009). In the top-down approach, user requirements are to analyse, plan and

design it, and implement it as a whole. But, this approach has many problems such as

high costs, difficulty of the analysing and collecting of all sources, difficulty of

collecting all specific needs of all the organisational departments and more

development time. In the bottom-up approach the data warehouse is built and then

several data marts will be created. This method takes a partial picture of the whole

application, therefore, there is a risk involved with this method (Golfarelli & Rizzi,

2009). The bottom-up approach is the accepted method of most users. Moreover, List


et al. (2002), have identified three data warehouse methodologies such as Data-

Driven Methodologies, Goal Driven Methodologies, User Driven Methodologies.

Data – Driven methodologies

As stated by List, Bruckner, Machaczek and Schiefer (2002), “Bill Inmon, the

founder of data warehousing argues that data warehouse environments are data

driven, in comparison to classical systems, which have a requirement driven

development life cycle”. Also, as mentioned by Inmon (as cited in List et al, 2002),

user requirements are need to consider finally on the decision support system life

cycle.

Goal driven methodologies

List et al (2002), discussed about the Semantic Object Model (SOM) process

modelling technique that presented by Böhnlein and Ulbrich-vom Ende. In the first

stage of the technique, identifies the company goals and services. Then the SOM

schema applying to analysed the business processes. This helps to track the

company’s customers and their business transactions, and then at the next stage these

transactions are transformed into the existing dependencies called information

systems. The final step, identifies the measures and dimensions (according to

transactions and dependencies) (List et al., 2002).

User driven methodologies

According to Westerman (as cited in List et al 2002), the user driven

methodology is a Wal-mart approach. This approach mainly focuses on

implementing a business strategy. “The methodology assumes that the company goal

is the same for everyone and the entire company will therefore be pursuing the same

direction” (List et al., 2002, p. 205). The first prototype is developed according to the

business needs. Firstly, business people set goals and then identify and prioritise the

business questions that support the business goals. Then the most important

questions are classified with the data elements.

Moreover, there are many development methodologies are introduced by

different authors and organisations. As stated by Golfarelli and Rizzi (2009) (as cited

16


in Kimball et al, 1998), business dimensional life cycle used to design, develop and

implement data warehouse systems. The rapid warehousing methodology is another

approach to managing the data warehousing projects. This approach was introduced

by the SAS institute, who is leader in the statistical analysis industry. The rapid

warehousing methodology consists of seven phases: Assessment, requirements,

design, construction and final test, deployment, maintenance and administration and

review (Golfarelli & Rizzi, 2009).

2.2.5 DATA WAREHOUSE LIFECYCLE

The data warehouse life cycle plays a major role when developing a data

warehouse. The following figure shows the basic phases of the data warehouse life

cycle. This life cycle takes the bottom up approach (Figure 5). The main phases of

this life cycle are setting goals and planning, designing infrastructures and designing

and developing data marts (Golfarelli & Rizzi, 2009). The first phase involves

feasibility study. In this phase many activities take place such as setting system goals

and estimating the costs for building the data warehouse. The next phase, analyses

and compares the architecture solutions for the data warehouse design (Golfarelli &

Rizzi, 2009). Moreover, the designer must consider the available tools and

technologies for design the plan. The final step involves designing and developing

the data marts. In this phase, new data marts are created and added to the data

warehouse system (Golfarelli & Rizzi, 2009).

Figure5: Data warehouse system life cycle (Golfarelli & Rizzi, 2009, p. 46)

Setting goals and planning

Designing infrastructures

Designing and developing data marts


2.2.6 OPERATIONAL SYSTEMS VS DATA WAREHOUSES

There are the many differences between operational systems and the data

warehouse. The primary difference between operational systems and data

warehousing systems is that operational systems are designed to support transaction

processing (OLTP) and data warehousing systems are designed to support online

analytical processing (OLAP). The users of the operational systems deal with one

record at a time. Also, they perform the same operational task repetitively. On the

other hand, a data warehouse is capable of handling with volumes of data at a time

and helps to make decisions in a timely and consistent manner with accurate and up

to date information (Kimball, 2002).

The follow table shows the differences between the on line transaction

processing system (OLTP) and a data warehouse.

Table 2: Comparison of data warehouse with OLTP systems (Kadlec, 2005)

18


According to Inmon (2005), there are many challenges that exist in the use of

current information systems. These include a lack in data credibility, issues with

productivity and inability to transform data into information. The lack in credibility

occurs due to many reasons such as time discrepancy, algorithmic differences, level

of data extraction, problems with external data and no common source of data from

the beginning (Inmon, 2005). This leads to many incompatibilities in the reports

generated by the different departments of an organisation. On the other hand,

productivity becomes a major issue when an organisation needs to analyse the same

data across all its departments (Inmon, 2005). This is because, many programs must

be written and there are many technological barriers to overcome (Inmon, 2005).

2.2.7 DATA MARTS

A data mart and a data warehouse have different architectural structures. On

some occasions there is a need to perform a standardized data analysis and organising

data to identify simple usage patterns. As a result of this, data warehousing is

arranged in to small units called data marts (Bonifati, Cattaneo, Ceri, Fuggetta, &

Paraboschi, 2001). As mentioned by Inmon (1999), “a data mart is a collection of

subject areas organised for decision support based on the needs of a given

department”. Therefore, each department has its own way of understanding how the

data mart should look. Each data mart is designed according to the department’s

needs (Inmon, 1999). The following table shows the structure and the differences

between the data marts and the data warehouse.

Data Mart Data Warehouse Departmental Corporate High level of granularity Low level of granularity

Star join structure Star join/Snowflake structure

Modest amount of historical data Robust amount of historical data Technology optimal for access and analysis Technology optimal for holding, and

managing massive volumes of data

Each department has a different structure Structure suits corporate understanding of data

Table 3: Differences between data mart and data warehouse (Inmon, 1999, p. 2)


2.3 DIFFERENT TYPES OF DATA WAREHOUSE MODELS

Different types of data warehouse models can be identified. Ponniah (2010),

describes basic data warehouse architectural types available (Figure 6). She has

introduced five different data warehouse architectural designs. These are, centralised

data warehouse architecture, independent data marts (IDM), federated architecture

(FED), hub and spoke and data marts bus architecture. Also, as mentioned by

Ariyachandra and Watson (2010), these are reference architectural types which

provides guidance when creating a new design.

2.3.1 CENTRALISED DATA WAREHOUSE

The centralised data warehouse models consider enterprise level information

requirements. The warehouse contains atomic level data which is maintained in the

third normal form and sometimes, summarised data will be stored. There are no

separate data marts developed in this architecture (Ponniah, 2010).

2.3.2 INDEPENDENT DATA MARTS

The independent data marts are developed to meet the needs of individual the

organisational units (Ariyachandra & Watson, 2005). However, these data marts do

not provide a ‘single version of the truth’. As stated by Marco (2000), several

features can be identified in the independent data marts architecture. These features

include:

- The each data mart is started directly from the operational systems.

- In general, data marts are built independently from one another by autonomous

teams (Independent teams will typically deploy tools, software, hardware, and

processes).

Also, inconsistent data definitions, use of different dimensions and measures of

IDM prevent analysing the data across the data marts (Ariyachandra & Watson,

2005). Moreover, Marco (2000) identified problems such as redundant data,

redundant processing, scalability and non integration of this architecture.

2.3.3 FEDERATED ARCHITECTURE (FED)

As stated by Ariyachandra and Watson (2010, p. 13), “this architecture leaves

existing decision support structures (e.g., operational systems) in place”. The data in

the warehouse integrates logically or physically using different methods such as share

20


keys, global meta data, distributed queries etc.. According to Jindal and Acharya (as

cited in Ariyachandra & Watson, 2010), this architecture is more suitable for the

firms that have pre-existing, complex decision support systems.

2.3.4 HUB AND SPOKE ARCHITECTURE

This architecture is similar to centralised architecture. It contains atomic

(detail) level data which are normalised into third normal form. There are

independent data marts attached to this centralised data warehouse. The independent

data marts acquire data from the centralised data warehouse. The centralised data

warehouse act as a hub and the independent data marts act as spokes. The

independent data marts develop for different purposes of the organisation (Ponniah,

2010).

2.3.5 DATA MART BUS ARCHITECTURE

The data mart bus architecture is designed according to the business

requirements of the organisation (Ponniah, 2010). At the beginning, data mart

architecture is designed with dimensions and measurements and later on,

measurement data marts are added to it. The data marts consist of atomic and

summarised data and are organised in star schemas (Ponniah, 2010).

Figure 6: Data warehouse architectural types (Ponniah, 2010, p. 33)


Figure 7: Different types of data warehouse architectures (Sen & Sinha, 2005, p. 80)

Moreover, Sen and Sinha (2005) discussed about some other different types of

data warehouse architecture (Figure 7). Some of these data warehouse architectural

types are extended versions of the above mentioned architectural types. For example

enterprise warehouse with operational data store, hub and spoke data mart

architecture.

2.4 DATA WAREHOUSE ARCHITECTURE/MODEL SELECTION FACTORS

According to the survey done by Forrester as cited in Agosta, 2005 among 213

practitioners at the Data Warehousing Institute in the San Diego Conference in

August 2004, most respondents selected the “Hub and Spoke” data warehouse

architecture as the most suitable architecture (see figure 8).

Agosta (2005) stated that, “the survey did not ask about data modelling

philosophy, and this survey is perfectly consistent with practitioners implementing

dimensional models in different architectures - centralised, hub-and-spoke, as well as

"conformed" designs”. However, Agosta (2005) argued that there is no right or

wrong data warehousing architecture itself, because most of the architectures

(models) are successful with alternative architectures.

22


Figure 8: Results of the survey (Agosta, 2005)

Another survey conducted by Ariyachandra and Watson (2005) among 454

participants, on data warehouse architecture selection among companies, showed that

39% selected the hub and spoke architecture and only a small percentage selected the

federated architecture (Figure 9).

Figure 9: The distribution of the architectures (Ariyachandra & Watson, 2005, p. 24)

According to Ariyachandra and Watson (2010, p. 1), “data warehouse selection

decision is a subset of IT infrastructure (ITI) design”. However, little research has

been conducted in ITI design and most findings are depicted from case studies or


recommendations which are developed from observation or indications. As stated by

Ariyachandra and Watson (2010), most of the research does not address the factors

that influence the data warehouse design. Ariyachandra and Watson (2010), have

introduced a research model for data warehouse architecture selection. Figure 10

shows the research model they have introduced.

Figure 10: Research model for data warehouse architecture

selection (Ariyachandra & Watson, 2010, p. 4)

Their research on this model and further analysis shows that there is a

combination of several factors affecting the selection of data warehouse architecture.

They have introduced an overall model for data warehouse selection. The model has

been created according to the selection factors that were chosen as most important.

As stated by Ariyachandra and Watson (2010), based on organizational information

processing theories (OIPT) information processing needs to occur as a combination

of interdependence and task routineness. Also, both sponsorship level and

information processing needs manipulate creation of the strategic view of the

warehouse selection (Ariyachandra & Watson, 2010). Moreover, resource

constraints, the perceived ability of IT staff and urgency (facilitating conditions) also

influence the warehouse architecture selection (Ariyachandra & Watson, 2010)

(Figure 11).

Figure 11: An integrated model for DW architecture selection (Ariyachandra & Watson, 2010, p. 11)

24


2.5 HEALTH INFORMATION MANAGEMENT

As mentioned by Johns (2002), information management is defined in several

ways by different authors. Synott and Gruber state (as cited in Johns, 2002), the

information management function provides control and management over

information resources. Also, Scheyman states (as cited in Johns, 2002, p.4)

information management “refers to information characteristics such as information

ownership, content, quality and appropriateness”. The information management tasks

that are performed traditionally in healthcare organisations are highly quantitative

and departmentally focused (Johns, 2002). The role of the health information

manager includes responsibility for managing health information in the given

context. The traditional activities of the health information manager include to

planning, developing and implementing systems designed to carry out tasks such as

control, monitor, store, retrieve data on a departmental basis (Johns, 2002). Today,

the tasks of the information manager are changing alongside the increasing

information complexity in healthcare and they act as an information broker of

information services such as information engineering, retrieval and analysis.

2.5.1 HEALTHCARE DECISION-MAKING

The following figure (Figure 12) shows the decision-making levels of an

organisation. The top level of decision-making involves strategic decision-making

(Johns, 2002). At this level managers make decisions about the overall goals of the

organisation. For instance, types of decisions made on this level include which

services need to be provided (such as acute, ambulatory or long term care). and at

which geographical location to operate (such as local, state, national) (Johns, 2002).

The second level concerns tactical decision-making. The decisions made on this level

relate to the tactical units of the organisation such as patient care services and

marketing (Johns, 2002). The third level concerns the day to day decisions of the

organisation such as hiring employees, ordering supplies and medications, processing

bills (Johns, 2002).


Figure 12: Decision-making levels within an organisation (Johns, 2002, p. 36)

2.5.2 HEALTHCARE INFORMATION SYSTEMS AND DECISION-MAKING

The importance of information technology to healthcare services can be seen

differently from the perspectives of patients, professionals and government and

funding agencies. A patient expects easy access to personal information, knowledge

to provide self care, timely access to their healthcare professionals, privacy and up to

date care. On the other hand, professionals’ expectations of Information Technology

(IT) are different from those of the patients. As professionals they expect focused

information, support for effective use of IT, decision support tools and new education

and training. From government’s or funders’ perspectives accountability, efficiency,

sustainability and scalability are expected through the implementation of IT to

healthcare services (B. Barraclough, personal communication, March 31, 2009).

Therefore, the important issue to consider is to try to achieve these needs through

integrating IT with healthcare services. As stated by Lenz and Reichert (2007), to

offer IT support effectively it is vital to understand healthcare processes

characteristics.

According to Johns (2002), healthcare information systems were paper based

for more than a century. The first use of computers in healthcare was reported to be

between in 1960s and early 1970s. Evolution of healthcare information systems is

shown in Figure 13. There are many information system applications used in

healthcare today. As stated by Johns (2002), most of these applications are clinically

26


oriented systems such as patient monitoring systems, nursing information systems,

laboratory information systems and so on. Also, there are applications which are

supportive for the operational activities or managerial activities of a healthcare

institution such as accounting information systems, human resource management

information systems and materials management. On the other hand, some of the

information systems are external to the organisation. As stated by Johns (2002), the

information manager of an institution understand the components of information

systems, how the system affects the organisation and others outside the organisation.

In the late 1980s, hospitals had started to implement many systems to support

strategic decision-making, managerial decision-making and quality improvement

(Johns, 2002). According to Grimson et al. (2000), previously, healthcare

organisations consisted of individual units which were operated independently from

one another and the need for information sharing was seen as less of a priority than it

is today. However, the inability of sharing information across systems and

organisations creates major barriers in progress on shared care as well as cost

containment (Grimson et al., 2000). Moreover, as mentioned by Johns (2002),

although the transactional databases contain a wealth of information it is impossible

to extract information for high level decision-making. Also, absence of integrated

healthcare leads to risks of medical treatment errors, lack of coordination, multiple

examinations and increased therapy costs (Stolba & Schanner, 2007). Furthermore,

according to Kerr, Norris and Stockdale (2007, p. 1017), “in the healthcare sector

lack of data quality has far-reaching effects. Planning and delivery of services rely on

data from different sources such as clinical, administrative and management

sources”. Therefore, if the quality of the data is higher it helps to retrieve better

information (Kerr et al., 2007).


Figure 13: Timelining Health Information Systems Evaluation (Johns, 2002, p. 61)

As mentioned by Landrum, Peachey, Huscroft and Hall (2008), there are many

technological advances in use or under development to improve decision-making in

healthcare industry such as Decision Support Systems (DSS). These systems help the

hospital operate efficiently by reducing medical or prescription errors, organizing

staff and patients by reducing the patients waiting time and facilitating effective

diagnosis of the patients symptoms (Landrum et al., 2008). Some of the common

DSS in healthcare are marketing systems, cost accounting systems and case-mix

systems (Johns, 2002). These systems consist of tools that help the manipulation of

data and “what if” analysis scenarios for strategic decision-making (Johns, 2002).

According to Arigon et al. (2007), Clinical Decision Support Systems (CDSS) were

introduced to assist decision-making in healthcare. However, the scope of this is

limited when compared to clinical data warehousing (Arigon et al., 2007).

28


Also, as stated by Rajan and Ramaswamy (2010), because health data are

derived from different environments there is a significant probability of errors and

uncertainty. Moreover, many factors such as poor data quality, inconsistent

representation and complicated domain knowledge etc., causes clinical decision-

making to be a labour intensive and error prone task (Zhou, Chen, Liu, Zhang, Wang,

Li, Guo, Zhang, Gao, & Yan, 2010). Therefore, effectively integrating health data

from different sources is becoming recognised as a crucial factor (Shams & Farishta,

2001).

There are a number of technologies available to integrate data. These include

data warehouses, database federations, database federation with mediated schemas

and peer data management systems (Louie, Mork, Martin-Sanchez, Halevy, &

Tarczy-Hornoch, 2007). As mentioned earlier, data warehouses integrate data from

different sources to a single repository. In a database federation, integration of

disparate sources is effected by using software programs that interface with the

source (Louie et al., 2007). The database federations with mediated schemas

address problems faced by database federations when integrating data sources from

different sources. They use mediated schemas which act as middleware in a database

federation. In peer data management system (PDMS) “each data sources provides

semantic mapping to either one or a small set of other data sources or peers (Louie et

al., 2007, p. 8).” Each of these data integration technologies has advantages and

disadvantages as shown in Figure 14. By using data and knowledge formalisms such

as relational schemas, semi-structured data and ontologies, data are integrated in the

above mentioned data integration architectures (Louie et al., 2007).

Figure 14: Advantages and disadvantages of data integration architectures (Louie et al., 2007, p. 6)


However, data governance is also an important factor to consider when implementing

a data integration project. “Data governance refers to the overall management of the

availability, usability, integrity, and security of the data employed in an

enterprise”(Federal Student Aid, 2007). This will improve data consistency in

decision making, improve data security, decrease the regulatory fines and assign

accountability of data quality (Delgado, 2011). Although there are standards and

security and compliance frameworks available for the healthcare industry, healthcare

organisations should implement privacy programs to their data governance programs

(Delgado, 2011). To implement the effective privacy program basic elements such as

formal policy governance structure, written policies, funding and procedures to

handle complaints need to be addressed(Delgado, 2011).

2.6 DATA WAREHOUSING AND HEALTHCARE

“In recent years, medical professionals are witnessing an explosive growth in

data collected by various organisations and institutions” (Kerkri et al., 2001). Hence,

there should be effective systems to manage healthcare data. As mentioned before,

OLPT systems are not designed to provide support for the ad-hoc queries. The reason

is, although transaction systems are rich in information it is very difficult to obtain

the appropriately linked and analysed information for higher decision-making levels

such as managers, executives. One solution that many organisations turn to is

implementing a data warehouse concept (Scheese, 1998). According to Wah and Sim

(2009, p. 530), “data warehousing is becoming an indispensible component in data

mining process and business intelligence”. Increasing quantities of healthcare data is

not the only problem, healthcare expenditure is another problem. Healthcare

expenditure is increasing and is a burden for both individuals as well as governments

(Yan & Jianli, 2005). For instance, annually U.S. allocates a trillion dollars for

healthcare expenditure (Berndt, Fisher, Hevner, & Studnicki, 2001). Therefore, there

is a need for a strategy to reduce healthcare expenditure and to improve quality of

care.

In the context of the hospital systems, healthcare data comes from disparate

sources such as hospital administration systems, clinical databases and financial

systems and appears in many forms such as spread sheets, published books and other

data formats (Berndt et al., 2001). The data warehouse provides an opportunity to

30


integrate these separated systems and provide help for efficient decision-making.

According to the survey have done by Health Industry Insights company in U.S.A.

(Figure 15) among 36 participants from healthcare provider chief information

officers (CIOs) it was shown that roughly 40% selected that their current use of

business and clinical intelligence is limited to deployment of data marts or data cubes

(Holland, 2009). Also, 35% indicated limited use of business intelligence

(BI)/clinical intelligence (CI) tools that are incorporated into their packaged software

applications (e.g. electronic medical records (EMR), financial applications).

Figure 15: Current use of BI/CI by healthcare organisations (Holland, 2009, p.9)

2.6.1 DATA WAREHOUSE IMPLEMENTATION EXAMPLES

According to Winter (2007), data warehousing concepts in the healthcare

environment have been implemented successfully in the private sector as well as in

some government agencies in the USA. He has mentioned real examples of success

stories of implementing data warehousing in the healthcare sector such as hospitals,

and among commercial healthcare providers. As stated by Winter (2007), most of

these healthcare organisations gain more benefits by implementing data warehousing.

Some of these examples are outlined below.

The Midwestern Health Insurance Company in USA uses their data warehouse

to identify and encourage optimal practices. The company found that the mortality


rate in cardiac surgery was lower for some healthcare providers. Subsequently, the

significant finding was that mortality rate for bypass surgery for this insurer’s

members declined by 75%, from 4% to 1%. Another example involves, commercial

pharmacy savings of forty million dollars achieved in one sixth-month period with

their data warehouse based program (Winter, 2007).

Veteran’s Health Administration (VHA) in USA is another institution that

gains benefits from their data warehouse. The aims of their data warehouse use are to

improve the quality, efficiency and safety of its medical care; measure the

effectiveness of the care it offers; and to facilitate medical research. The VHA have

saved millions of dollars on an annual basis through better decision-making (Winter,

2007). Also, the New South Wales department of Health (NSW Health), in Australia

is another example for data warehousing success stories. NSW Health is responsible

for many services such as a State-wide ambulance service, mental health services,

drug and alcohol services and a network of community health centres etc. (Sybase,

2010). The new improvement to the data warehouse with Sybase provides an

opportunity to enhance their benefits in several ways. Some of these benefits are:

• Reducing data loads by 76 percent

• Achieving a data compression rate of over 70 percent

• Simplifying administration and reduce overhead costs

• Delivers queries 85 percent faster (Sybase, 2010, p. 1)

However, many findings show that certain factors are important for the

success of data warehousing. Winter (2007) introduces eleven critical factors that

should be addressed for successful data warehousing in healthcare services. These

factors include:

• The Enterprise approach

• Support for complex data structure

• Support for complex queries

• Large data volumes

• Concurrent and timely use

• Flexibility

32


• Support and education

• High availability

• Privacy and security

• Data quality and standards

• High performance

Facilitating the enterprise approach to data warehousing provides the greatest

benefits to health services. Health data flows from multiple different areas to the data

warehouse. These data can flow from both internal as well as external sources.

Therefore, the integration of all these data for relevant decision-making is essential

(Winter, 2007). Concurrently, end users of the data warehouse need different views

of the data. For example a doctor needs a complete picture of a patient’s history of

tests, physical examinations, symptoms etc for making a clinical decision.

Alternatively, an insurer requires a complete picture of a hospital when providing

their services or its price structure. Likewise every user (physicians, payers,

regulators, and researchers) needs the same data filtered in different views.

Healthcare systems are dealing with large volumes of data and this is growing

rapidly day by day (Winter, 2007). Therefore, increasing the volume of data is a

challenge for data warehousing in healthcare services. The important thing to

facilitate is management of these high volumes of data in an efficient and effective

manner. When implementing a data warehouse, quality of data plays a major role.

According to Leitheiser (2001, p. 1), “healthcare organisations data is central to both

effective healthcare and to financial survival”. Therefore, data quality must be high to

provide reliable and dependable information for decision support. According to

Winter (2007), flexibility is another critical factor when implementing a data

warehouse. In other words, the data warehouse should be able to adapt to changes

which can occur due to variation in regulations, technology advances and fluctuations

in consumer expectations. The changes which occur may be simple or complex. For

example, new data types continue to grow with increasing use of images, text and

audio and must be accommodated.

The privacy and security of health related information also plays a major role

when implementing a data warehouse (Winter, 2007). As a data warehouse consists


of data derived from multiple sources, it is important to provide security for this data.

Especially in healthcare, patients require their health information to be kept more

secure. Providing privacy for health records means only authorised persons have

access to the data considering the patients permissions (Winter, 2007). The

requirement of securing data in the data warehouse is becoming more complex with

the extent of data that has to be dealt with. This will be a major challenge for the

health sector in the future.

According to Winter (2007), the healthcare data model should be able to

provide support for the complex relationships along with the tables. For example

the outcome of medical tests may range from a single to voluminous and to

complicated output structures. In the future, use of more information may lead to

further increased size and complexity of medical data. Therefore, a data warehouse

must be able to support this complexity of the data model (Winter, 2007).The

support for complex queries is another related issue to be considered. Healthcare

data warehousing involves joining complex data from many different tables.

Therefore, sometimes complex queries must be written to get the required data from

the system. Hence, the writing of queries may involve handling large non-collocated

joins on multiple large tables (Winter, 2007).

The concurrent and timely use point is an important issue because data

warehouses being implemented today and in the coming era aligns with and support

many activities and strategic goals of healthcare enterprises. Therefore providing data

concurrently for many clients in a timely manner provides an effective and efficient

healthcare system. Similarly, the high availability point is another important issue to

consider (Winter 2007). Availability of up to date information at the time required by

decision makers supports an effective and efficient healthcare system.. However,

there is a great challenge in accomplishing the task of providing a data warehouse on

a large scale that is continuously updated, complex and heavily used. Nevertheless,

with the support of new technology many commercial organizations are already using

these solutions. As with the other issues, support and education are also vital when

implementing healthcare data warehousing. Users are encouraged by providing better

support and education and the importance of this in change management is well

recognised.

34


Finally, according to the Winter (2007) the high performance point has three

basic meanings for a data warehouse: complete simple queries quickly; complete

large, complex queries efficiently and scalably; and load new data into the data

warehouse in a timely manner.

2.6.2 DATA WAEHOUSE IMPLEMENTATION CHALLENGES

However, as indicated by Winter (2007), there is a challenge for achieving all

these factors in a single platform. This is because to meet all these requirements at

once requires suitable architecture, organisation, readily usable applications and

executive support (Winter 2007). As stated by Lindsey and Frolick (2003), data

warehouse failures may involve multiple reasons. The following table will show

some of the reasons for data warehouse failure.

Table 4: Combined reasons for data warehouse failure (Lindsey & Frolick, 2003)

Table 4: (continued): Combined reasons for data warehouse failure (Lindsey & Frolick, 2003)


In general, the most common factors are weak management support and

inadequate user involvement. As mentioned Lindsey and Frolick (2008), data

warehousing success may be obtained by avoiding a small number of critical factors

for failure rather than attempting to achieve all critical factors for success. Also,

according to the table, the main reasons for data warehouse projects failure in the

healthcare industry are insufficient funding, organisational politics and weak

sponsorship and management support. According to the survey done by Health

Industry Insights in the USA with a sample size of 33 CIOs health providers, it was

shown (Figure 16) that the three top barriers to the use of the business or clinical

intelligence applications at their organisation are lack of funding, lack of staff

resources and data quality and inconsistent data standards (Holland, 2009).

Figure 16: Top 3 barriers to the use of business/clinical intelligence applications (Holland, 2009,p.11)

According to Holland (2009), another survey conducted by IDC and InfoWorld

2008 among 516 end users and system integrators, show that the 45% of respondents

selected the main challenge is data quality (Figure 17). The next highest results at an

equal percentage (29%) was real-time data integration. Integrating BI software with

existing IT portfolio was selected as the other major challenge (Holland, 2009).

36


Figure 17: Top 3 IT challenges to implementing/deploying BI applications (Holland, 2009,p.12)

2.7 SUMMARY AND IMPLICATIONS

The aim of data warehousing is to organise the gathering of a wide range of

data and store it in a single repository. The main components of the data warehouse

are Operational source system, Data Staging Area, Data Presentation Area, Data

Access Tools. Many authors have introduced different data warehouse modelling

methods such as the ER modeling approach, dimensional modeling approach, object

oriented approach. Among all the mentioned data warehousing approaches the most

frequently used approach is the multi-dimensional or data cube approach. Two

modeling techniques named star schema and snowflakes schema are used to

represent multidimensional data. There are two main basic methodological

approaches used to develop data warehouse design; those are the top- down approach

and the bottom-up approach.

The data warehouse is different from operational systems in many ways. The

primary difference between operational systems and data warehousing systems is that

operational systems are designed to support transaction processing (OLTP) and data

warehousing systems are designed to support online analytical processing (OLAP).

Moreover, differences can be seen in use of data, users, database sizes, transactions,

and data entry when compared with the OLPT systems. There are many different

architectural types that can be identified. Those are centralised data warehouse


architecture, independent data marts (IDM), federated architecture (FED), hub and

spoke and data marts bus architecture. There are many factors contributing to

selection of data warehouse architecture selection and it is important to consider

these factors also when implementing the data warehouse.

As mentioned before, data warehousing technology predominantly aims to

structure the data in a summarised way which supports improved access to the data

and use of it in an efficient and effective manner. Currently, data warehousing plays a

major role in commercial businesses. The healthcare system is one sector dealing

with large amounts of data derived from many different sources. Therefore, there is a

need for a very effective system to capture, collate and distribute health data. From

the literature, it can be seen that decision-making with current information systems is

a very difficult task. This is because many issues such as lack of resources to

integrate data, lack in data quality and health data privacy and confidentiality

standards hinder effective decision-making. Integrating healthcare IS with new

technology paves the way to obtain a number of benefits such as improved access to

data, evidence based decision-making and provision of quality services etc.. There

are many data integration mechanisms available. There are many advantages as well

as disadvantages associated with these architecture types. However, from the

literature it can be seen that implementing the data warehouse concept is one of the

best potential solutions available that can be used for strategic and tactical decision-

making in healthcare.

Chapter 3: Research Design 39

Chapter 3: Research Design

This chapter describes the design adopted by this research to achieve the

aims and objectives stated in section 1.4 of Chapter 1. Section 3.1 discusses the

methodology and research design used in the study, the stages by which the

methodology was implemented, and the research design. Section 3.2 details the

participants in the study and section 3.3 lists all the instruments used in the

study such as the questionnaire and the face to face interviews. Section 3.4

outlines the timeline for the research project and section 3.5 discusses how the

data was analysed. Finally, sections 3.6, 3.7 and 3.8 discuss the ethical

considerations, intellectual property rights and health and safety issues of the

research project.

3.1 METHODOLOGY AND RESEARCH DESIGN

3.1.1 METHODOLOGY

The methodology used in the survey consists of four stages. The

following table shows the four stages and the research methods.

Stage Description Research Methods 1 Review data warehouse models Case studies

Literature Review 2 Study the cardiac surgery, ICU, quality and

safety and clinical costing units data repositories, decision-making processes and identify the issues

Questionnaire Unstructured Interviews Documentation review

3 Select or propose a suitable architecture Case studies Literature Review Interview/Collaboration

4 Develop and analyse the data product outputs of the model

Data analysing tool Interview

5 Analysis of the benefits of the model Feedback collect from the end users

Table 5: Methodology stages

40

40 Chapter 3: Research Design

Stage 1 – Review the data warehouse models

The first stage involves reviewing the data warehouse models available.

There are a number of data warehouse models which have been introduced in

fields such as healthcare, telecommunication and marketing. Therefore, literature

review and case studies provide valuable information as a first step for the

current research.

The literature review and case studies are important to,

• Develop background knowledge about data warehousing and its models and how

it differs from operational systems

• Identify how the data warehouse models are applied in different fields

• Study how the data warehouse concept is applied in the healthcare field

• Analyse what factors leads to selection of the optimal model

Literature search sources and information search strategies are covered in the

literature review methodology part (section 2.1).

Stage 2 – Study the data repositories, decision-making process and issues

As the second step, it is important to study the data repositories available in the

cardiac surgery, ICU, quality and safety and clinical costing units at The Prince

Charles Hospital. This will help to identify the databases and operational data stores

available and currently used for decision-making. To investigate the data repositories,

it is necessary to obtain assistance from the cardiac surgery unit, ICU, quality and

safety and clinical costing units staff and also the hospital IT department (Business

Solution Unit).

After identifying potential data repositories that are significant, it is necessary

to study the decision-making process itself. A questionnaire was developed as an

instrument to determine this. The questionnaire is further described in the

Instruments section below.

The sample of stakeholders identified was essentially a convenience sample of

a cross-section of roles related to the selected databases sources. The sample included

clinical data managers, unit managers and clinicians either directly or indirectly

involved in decision-making processes based on the selected data sources. This is

further described in the Participants section below. The questionnaire was then given

to the identified end users and stakeholders involved in facilitating or making

41


decisions with the current information systems to identify the current issues in the

decision-making process and requirements for the data warehouse prototype design.

Furthermore, unstructured interviews have been conducted to gather more detailed

information and identify barriers to the development of the optimal prototype design.

The responses to the survey will be thematically analysed to develop sample

clinical questions that are to be addressed by a warehouse model and to provide

information products for assessment; to identify issues in current data management,

integration and analysis; and to identify any potential issues for selection or

development of a warehouse model.

Stage 3 - Propose a suitable data warehouse model and develop the data

warehouse prototype

As a result of the previous two steps, stage 3 involves recommending a suitable

data warehouse model for the cardiac surgery and associated clinical units. The data

warehouse models that are potentially going to be used are described in section 3.2 of

the literature review. The data warehouse model will be selected following

integration of the literature review and analysis of the information gathered from the

questionnaire and observation of the data and decision-making processes at the

cardiac surgery unit with appropriate feedback and consultation with end-users. Five

sample decision intelligence problems have been selected to guide the table structure

development of the data warehouse prototype. The data warehouse prototype will be

developed using SAS data integration studio, which is a standard tool provided for

student use through QUT.

Stage 4 – Develop and analyse the information product outputs and benefits of

the model

The stage four involves analysis of the information product benefits of the data

warehouse prototype. The SAS enterprise guide data analysis tool is used to analyse

the integrated data. To conduct this analysis, the integrated data for five clinical

intelligence decision-making problems are analysed and the results will be displayed

in report format for evaluation by the stakeholders (clinicians, unit managers and data

managers from the ICU, cardiac surgery, clinical costing and quality and safety).

42


Stage 5 – Evaluation of DW model

The information products (outputs generate from the data warehouse prototype)

will be presented to the relevant stakeholders for clinical interpretation and

evaluation of the usefulness of the data warehouse model prototype. Generally,

Return On Investment (ROI) uses to measure the success of the data warehouse

(Threshold Consulting Services, 2005). Some other methods that can be used to

evaluate data warehouse model are, usage measurement, surveys, response time and

availability (Threshold Consulting Services, 2005). As stated by Shcherbatykh,

Holbrook, Thabane, & Dolovich (2008) randomised controlled trial (RCT) is another

methodology that can be used to assess benefits, harms and cost of health informatics

interventions and evaluate validity. However, implementing RCTs are challenging

in health informatics. This is because health informatics trials are involved with

‘complex interventions’(multifaceted) and also involving multiple targets such as

clinicians and patients. Another challenge is, some of the features of RCTs are not

always feasible in electronic health technologies (Shcherbatykh et al., 2008). Lack in

methodologic guidelines to conduct health informatics trials is another challenge

(Shcherbatykh et al., 2008).

In the real world situations it will take some time (one to three years or more)

to measure the actual benefits of the data warehouse, where changes to clinical care

models through quality improvement initiatives guided by warehouse information

products and management decisions may ultimately improve the health service.

According to de Mul et al. (2010), testing phase of the ICU data warehouse

development at the Erasmus Medical Center, Roterdam took almost 2 years to

complete. The evaluation of data warehouse is a complex and time consuming task it

is outside the scope of this project, therefore this research project will use feedback

collected from relevant end users to evaluate the potential usefulness of a data

warehouse based on the proposed warehouse prototype model.

3.1.2 RESEARCH DESIGN

The research design is essentially a case study incorporating qualitative

research methods to collect data from the cardiac surgery, ICU, quality and safety and

clinical costing units information managers and end users. The two methods used for

data gathering were firstly, a questionnaire was provided to end users to collect

43


information on the current decision-making process, on the issues and gaps in the

current decision-making process and to identify the data warehouse prototype

development requirements. Secondly, interviews were held after analysis of the data

collected from the questionnaire. The reason for conducting the subsequent

interviews was to clarify the information provided in the questionnaire and gather

more detail as required. The interviews were held only with people who agreed to

participate further as indicated in the appropriate response on the questionnaire.

Questionnaires and unstructured interviews were analysed thematically to identify the

decision-making issues, current decision-making process and user requirements.

Finally, qualitative analysis will be carried out to analyse the benefits after

developing the prototype and distributing the integrated analytical information

product.

3.2 PARTICIPANTS

The participants of this study are end users in the cardiac surgery, ICU, clinical

costing and quality and safety units. The end users are mainly clinicians, data

managers and unit managers of the above mentioned units directly involved in

decision-making or in supporting the decision-making process in healthcare practice.

Clinicians and unit managers are key decision makers for the units. Also, data

managers at the different units support the clinicians’ information needs as required

and are involved in facilitating the process of transforming data into useful

information. The questionnaire has been given to a sample size of ten participants

from the cardiac surgery, ICU, clinical costing and quality and safety units. The

small sample size of the survey was a result of availability of end users and the time

limitation of the project. As there are only few staff members involved in informatics

in clinical services, the participants enrolled represent a good cross section of

relevant staff related to cardiac surgical decision-making processes.

3.3 INSTRUMENTS

The instruments that are used in the survey are the questionnaire (Appendix A)

and the unstructured interviews. The questionnaire consisted of twelve questions

producing data for four categories of inquiry namely: current data repositories,

decision-making process, current issues, data storage and analysis needs. The

44


questionnaire was designed by referring to related questionnaire design theories,

sample questionnaire designed by a researcher (Mathew, 2008) and also gained

advice from my supervisory team and data warehousing literature providing

examples of similar surveys discussed in the book published by Golfarelli and Rizzi

(2009). The questionnaire has been given to the end users to identify the current

decision-making process at the cardiac surgery, ICU, clinical costing and quality and

safety units. This will help to gather the information in a structured way to identify

the issues in the decision-making process and to identify the main technical

requirements for the data warehouse prototype development. After analysing the data

collected from the questionnaire the unstructured face to face interviews were

conducted to collect further information and clarify and define the desired warehouse

information output. The design of subsequent interview questions was developed

based on the answers given in the questionnaire.

3.4 PROCEDURE AND TIMELINE

At the first stage of the research, the researcher identified and observed

database diagrams and held discussions with end users of the ICU, cardiac surgery,

clinical costing and quality and safety units. This process took three weeks. At the

second stage, the questionnaire was designed after referring to the literature and with

the help of my supervisory team. It was reviewed by my supervisory team and the

whole process took about two weeks. At the next stage, the hard copy of the

questionnaire was given to the end users of the ICU, cardiac surgery, clinical costing

and quality and safety units. The time arranged for answering the questionnaire was

twenty minutes. Answered responses were collected within one week. All the data

from the answered questionnaires were entered in to a Microsoft Excel spread sheet

for analysis. Survey results analysis took one week and as the next stage unstructured

interview questions were designed. Finally, unstructured interviews were conducted

only with end users who agreed to participate further as indicated on the

questionnaire. Interviews were conducted to collect further information required for

data warehouse prototype development. The length of the interviews was

approximately twenty minutes. Interviews were conducted among six participants

and data collected from the interviews were recorded in the written format. Both

questionnaire distribution and interviews were conducted at the Prince Charles

45


Hospital under the supervision of the Cardiac Surgical Registry Coordinator. All the

collected data (questionnaire and interview responses) were kept in secure place and

treated confidentially.

3.5 ANALYSIS

Both questionnaire and unstructured interviews (face to face) survey results

were analysed according to three sections: Firstly, as the current decision-making

process and secondly the issues related with current decision-making process at the

cardiac surgery unit, ICU, clinical costing unit and quality and safety unit. Thirdly,

the survey results were analysed to gather information for the technical details of the

data warehouse prototype development. Following the analysis of the questionnaire

five sample decision problems that could potentially be resolved by the decision

makers following use of a warehouse information products were selected and put into

a table (as shown in Table 6).

3.6 ETHICS AND LIMITATIONS

As mentioned in the research methodology this research involved collecting

data by questionnaire and unstructured interviews. The participants were selected

from the cardiac surgery , ICU, quality and safety and clinical costings units at

TPCH. The participants are data and information end users (clinicians, data

managers, unit managers, directors) of these units. The questionnaire did not include

any individual idenfiable data unless the participants indicated willingness to be

interviewed and provided their name and contact details.

Also, unstructured interviews were conducted after data analysis from the

questionnaire. Interviews were conducted with end users to clarfy details and the

warehouse application requirements. Although unstructured interviews were held

with individuals, no sensitive data likely to have any negative effect on the

individuals was collected. All these data were kept in a secure place and treated

confidentially.

There is a need to access some of the identifiable clinical data in the data

repositories when testing the data warehouse prototype and developing the

information product outputs. Therefore, to ensure the confidentiality of such data and

provide accountability and responsibility, documents were signed with the Prince

46


Charles Hospital to make the researcher an honorary employee, therefore bound to

the Queensland Health Code of Conduct. This preventive measure ensures a)

authorised on site access to data but no removal or transfer of data outside Qld Health

premises, for example by storing data in any manner such as in personal computer,

USBs, CD’s etc.. b) prohibition from disclosure or discussion of patients personal

information with others. Ethics committee clearance approval was received from

both QUT and the Prince Charles Hospital ethics committees.

3.7 INTELLECTUAL PROPERTY RIGHTS

This research project has been commenced as a part of the Masters of Research

course at the Queensland University of Technology (QUT). As a part of the research,

at some stage work has to be carried out with collaboration with Prince Charles

hospital. The final report of the study may provide commercial value to both QUT

and Prince Charles hospital. Approval has been given by the QUT for IP right

process.

3.8 HEALTH AND SAFETY

This research project does not involve working with any kind of biomedical,

biochemical or biological materials. However, this research involved interviews and

therefore the researcher applied for assessment of work and got the approval.

Chapter 4: Results Analysis 47

Chapter 4: Results Analysis

This chapter provides analysis of results collected from the survey. Data

collected from the survey instruments was analysed according to three sections.

Section 4.1 explains the current decision-making process and section 4.2 identifies

the current decision-making issues. The final section (section 4.3) summarises the

data warehouse prototype development requirements.

A total of ten questionnaires were distributed to stakeholders in the cardiac

surgical decision-making processes. An 80% response rate was achieved (8 out of

10) although 30% (3 of 10) did not wish to participate in further interviews. Only ten

questions (out of 12) were analysed due to lack of responses returned for questions 9

and 10. Questions 1-4 were analysed to identify the current decision-making process

and questions 6-8 and question 11 were analysed to identify current issues in the

decision-making process and finally, questions 5 and 12 were analysed to identify the

user requirements for data warehouse prototype development.

4.1 CURRENT DECISION-MAKING PROCESS

According to the questionnaire responses, 87.5% of responders’ use data from

outside of their data repositories for the decision-making process. Also, the cardiac

surgery unit shares or would like to share information with other hospital units,

especially ICU, the quality and safety unit and the clinical costing unit. However,

limited interaction between these databases creates inefficiencies in decision-making.

Moreover, results analysed from the questionnaire as well as unstructured interviews

indicated there is limited or no access to some of the data repositories. According to

the results of the questionnaires some aspects of the current decision-making

processes involving multiple sources of data are as follows,

• If clinicians working in the quality and safety unit requires information from

many other databases (including ICU, CARPIA, e-DS) they have to contact the

data custodian in the other departments to extract and provide specific data.

48

48 Chapter 4: Results Analysis

• If clinicians from the ICU unit need information from the e-DS, CARPIA,

Transition II, again they need to contact data custodians from the specific unit

and possibly also from the IT department to collect the specific data.

• A unit manager from the cardiac surgery unit has direct access to CARPIA. But

unit manager need to contact data custodians of the other units to collect specific

data.

When considering the end users decision-making process it can be seen that

currently there is a high degree of repetitive manual process related to data access and

acquisition. The clinician or unit managers collect the data separately by contacting

data custodians and individually integrate and assemble the data for analysis through

laborious and time consuming linking processes.

4.2 DECISION-MAKING ISSUES

Participants identified that there are many issues related to their current

decision-making processes. According to the questionnaire responses, the majority of

participants (75%) were not satisfied with the support provided by the current

information systems for decision-making. The following figure (Figure 18) shows the

response rate.

Figure 18: Current support from the IS’s for decision - making

49


According to further findings from the questionnaire and unstructured

interviews with end users from the clinical costing unit mentioned, systems need

to be integrated and should provide easier access to data. Also, the end users from

the cardiac surgery unit pointed out the difficulty in reusing data already held in

other data repositories to combine with cardiac surgical data to inform quality

improvement studies. Another respondent from the cardiac surgery unit mentioned

that there is a need of comprehensive data availability at all stages of point of care

and current systems do not support this. Furthermore, details response from a

quality and safety unit end user stated that “there is lack of support available for

the current decision-making process from the current information systems and

need a centralised data management (process) to improve decision-making”.

Analysis of the question regarding current decision-making issues, revealed

most of the end users (75%) have selected integration of data from other data

repositories as the main problem for current decision-making (Figure 19). The end

users such as clinicians and unit managers frequently know which information

they require, and from where the information is available, but they do not have

effective methods to integrate the data. As mentioned before the clinicians or unit

managers contact the data custodians to collect required data separately. This is a

time consuming and often complex process for both parties. For instance, from the

clinicians point of view, they have to analyse the data collected from separate

units (for example ICU, CARPIA, e-DS), from the data managers point of view, it

takes some time to obtain and integrate information for complex ad hoc queries or

they may have to contact the IT department or research staff to gather or extract

some of the data.

50


Figure 19: Decision-making issues with current IS’s

Limited accessibility to data and lack of data availability is the next main

problem pointed out by the end users (63%). As mentioned before, there is limited

access to databases or some end users may have difficulty obtaining authority to

access data repositories. According to further unstructured interviews held with

end users the main reasons identified are, security and confidentiality issues or

Queensland Health information related policies. With a rate of 50%, respondents

also selected lack of efficient reporting tools and lack of time and resources to

undertake analysis as two other problems. According to the data collected from

the questionnaire, analysis tools employed by the units are SPSS, Microsoft Excel

and QI Macros. However, in further interviews it is indicated that there is a need

to implement better data analysis tools.

51


Figure 20: Data quality issues in current decision-making process

The Figure 20 shows the data quality issues most often indicated on the

questionnaire were the lack in data completeness (more than 60%). Lack in

accurate consistency was selected by 50% of respondents as the next highest data

quality issue. Also, respondents indicated that lack in data accuracy (38%) and

lack in relevance (25%) were also data quality issues faced by them in the

decision-making process.

4.3 APPLICATION DEVELOPMENT REQUIREMENTS ANALYSIS

To develop the cardiac surgery data warehouse prototype, a sample of

clinical decisions or analysis processes made by the end users/ stakeholders were

firstly summarised from the questionnaire and interview responses. Five

significant clinical decision problems were selected from all the responses with

assistance from the cardiac surgical unit coordinator. The following table shows

the analysis of the user requirements according to the sample decision-making

problems. Table 6 shows the identification of data sources required for integration

to resolve the decision-making problem. Further analysis of the data sources

together with discussion with relevant stakeholders ensured the correct selection

52


of records and data items from the sources were included for later analysis and

information product reporting.

No Problems/decisions/analysis Data

repositories

Users

1 What are the clinical risk scores according to certain group? (eg: according to procedure, ventilation time)

CARPIA

ICU

Clinician, Unit

Manager

2 What is the expenditure per episode of care according to procedure, ventilation time?

CARPIA

Transition II

Unit Manager,

Clinician

3 What is the rate of e-discharge summaries send to GP’s according to clinical guidelines for the cardiac surgical patients according to operative data, surgical consultant?

CARPIA

e-DS

Unit Manager,

Clinician

4 What is the cost of various post operational complications according to the morbidity groups captured by the cardiac surgical registry?

CARPIA

Transition II

Unit Manager,

Clinician

5 Audit data sources to verify costings data includes high cost procedures appropriately?

CARPIA

Transition II

Unit Manager, Clinician

Table 6: Decisions/ Problems would like to address by end users

Figure 21 shows the analysis results for the question regarding the security

and privacy concerns for data warehouse development. According to the figure it

can be seen that 50% selected that there are no concerns of regarding

incorporating data security and information privacy in the data warehouse

development. Also, it is important to notice that more than 35% not answered and

less than 20% indicated that data security and information privacy should be

incorporated into the data warehouse development.

53


Figure 21: Security and privacy concerns for DW prototype development

Chapter 5: Data warehouse prototype development 55

Chapter 5: Data warehouse prototype development

This chapter outlines the data warehouse prototype development for TPCH.

Section 5.1 briefly explains the business intelligence tools used to develop the data

warehouse. Also, this section gives some details and benefits of the SAS data

integration studio and the SAS warehouse administrator tool, which are used for this

research. Section 5.2 provides details on business intelligence tools and section 5.3

explains the cardiac surgery data warehouse prototype selection and development

process step by step. Section 5.4 shows and discusses the information product output

result of the data warehouse prototype. The final section (section 5.5) shows the

feedback that was gathered from the end users to evaluate the data warehouse

prototype.

5.1 BUSINESS INTELLIGENCE TOOLS

The paper published by Sen and Sinha (Sen & Sinha, 2005, p. 81) compares the

15 different available data warehouse methodologies. These methodologies are

grouped into three categories: core technology vendors, infrastructure vendors and

information modelling companies.

Core technology vendors

The core technology vendors are those who sell the database engines. “These

vendors use data warehousing schemes that take advantage of the nuances of their

database engines”(Sen & Sinha, 2005, p. 82). The methodologies categorised into

core technology vendors by Sen and Sinha are NCR’s Teradata-based methodology,

Oracle’s methodology, IBM’s DB2-based methodology, Sybase’s methodology, and

Microsoft’s SQL Server-based methodology.

Infrastructure vendors

As mentioned by Sen and Sinha (2005) infrastructure vendors are people who

are involved in the data warehouse infrastructure business. The infrastructure tools

have mechanisms to manage metadata repositories and to extract, transform and load

56

56 Chapter 5: Data warehouse prototype development

data into the data warehouse. Also, these infrastructure tools have an ability to work

with other database engines (Sen & Sinha, 2005). Some examples for this category

are SAS’s methodology, Informatica’s methodology, Computer Associates’ Platinum

methodology, Visible Technologies’ methodology, and Hyperion’s Methodology.

Information modelling companies

This category includes Enterprise Resource Planning (ERP) vendors such as

SAP, PeopleSoft and business consulting companies such as Cap Gemini Ernst

Young and IT/data-warehouse consulting companies such as Corporate Information

Designs and Creative Data (Sen & Sinha, 2005).

As mentioned before, the data warehouse prototype in this study was developed

using SAS data integration studio 4.2. A constraint of this study was the availability

of Data Warehouse technology. As SAS was able to provide this technology for

student use to QUT, it was an expedient choice. However, as SAS is the third largest

business intelligence vendor worldwide (Vesset, 2010) it is considered a reasonable

choice for demonstrating a data warehouse prototype development in this clinical

environment. For this research project SAS software was used in two stages, at the

backend SAS data integration studio to develop the data warehouse prototype and at

the front end SAS enterprise guide was used to analyse data. Features of the data

warehouse technology are described below.

5.1.1. SAS/WAREHOUSE ADMINISTRATOR 4.3

SAS/ Warehouse Administrator is a tool which has been developed to design

the data warehouse/ data mart processes. It is a “customizable solution that offers a

single point of control, making it easier to respond to the ever-changing needs of the

business community”.

Some benefits of SAS/Warehouse Administrator software,

• Integrates extraction, transformation and loading tools for design data

warehouses/ data marts.

• Provides a better framework for effective warehouse management.

57


• Facilitates business subject definition, consolidation of business rules,

scheduling of processes for warehouse maintenance and integration with decision-

support tools for effective warehouse exploitation.

• Provides data warehouses more quickly to gain many benefits.

5.1.2 SAS DATA INTEGRATION STUDIO

SAS data integration studio 4.2 is a powerful tool that helps data warehouse

developers and data integration specialists to carry out data integration more

efficiently and effectively (SAS Institute Inc, 2006). SAS data integration studio

provides user friendly interfaces, extensive built in transformations and management

of complex enterprise data integration processes. Also, this software tool is easy to

use, collaborative and helps to integrate data faster and more effectively (SAS

Institute Inc, 2006).

Some of the benefits of SAS data integration studio include:

• Always access the data needed

SAS data integration studio enables accessing and processing data from legacy

systems or latest ERP applications. Also, new source systems can be simply included.

All these help to save time and assist decision makers to collect information they

required.

• Improve productivity

SAS integration studio provides a better user friendly interface for developing

and documenting the work. Also, manual coding is available when required. New

team members can adapt quickly to others work when needed.

• Manage security and administration at all levels

SAS data integration studio has opportunity to establish security and

administration levels quickly and easily. The reusable templates help to provide role

based authorization and administrative privileges at all levels efficiently.

• Deliver consistent, trusted and verifiable information.

This tool always delivers accurate information as needed. Also, data quality

tools help to examine the quality of data in the source systems. Furthermore, SAS

58


data integration studio assists users to identify where from the data is derived and

how it was transformed (SAS Institute Inc, 2006).

5.2 DATA ANALYSIS TOOLS

Data analysis tools are used to identify the patterns of the enterprise data. This

will provide useful insight about the trends in business. Some of the commonly used

data analysis tools are R, SPSS, SAS, Excel, Stata, Matlab etc.. For this research I

have used SAS enterprise guide data analysis tool. A brief introduction of this

software is given below.

5.2.2 SAS ENTERPRISE GUIDE

SAS is considered one of the appropriate and efficient analysis tools available

for producing data in the report form. SAS Enterprise Guide provides a SAS

graphical interface to publish dynamic results in a Microsoft Windows client

application (SAS Institute Inc, 2010). This application provides better information

for business analysts, programmers and statisticians (SAS Institute Inc, 2010). Some

of the benefits of this application include:

• Provide a self-service environment

• Provide efficient access to data sources

• Make reporting and analytics available to everyone.

(SAS Institute Inc, 2010)

5.3 CARDIAC SURGERY DATA WAREHOUSE PROTOTYPE SELECTION AND DEVELOPMENT

5.3.1 MODEL SELECTION RATIONALE

The selection of a specific data warehouse model is very challenging in the

healthcare sector. The selection of a data warehouse model for the Cardiac surgery

unit at TPCH was based on an integration of the literature review and the analysis of

user requirements from the stakeholder survey. According to the literature, the data

warehouse models that are mostly implemented or favoured are the federated data

59


warehouse model, centralised data warehouse model, enterprise data warehouse

model or hub and spoke data warehouse model. Some of the examples include:

• The Center for Medicare and Medicaid services (CMS) is a federal agency

that manages the Medicare and Medicaid programs in USA. Over the past

years, they have developed a number of data marts; more recently, they are

trying to implement an enterprise wide data warehouse model to integrate

data from different sources (Winter, 2007).

• Veteran’s Health Administration (VHA) in USA is another example of data

warehouse implementation (Winter, 2007). As described by Winter (2007),

they have implemented a corporate data warehouse (enterprise data

warehouse) to provide intelligence support for many clinical concerns such

as obesity, diabetes and depression. More recently, extensions have been

suggested for the enterprise data warehouse by introducing operational data

store (ODS) and web-based safety net interface and hybrid communication

functionalities (Bala, Venkatesh, Venkatraman, Bates, & Brown, 2009) . The

one main reason identified to introduce this type of extension is to be able to

respond quickly in large scale disasters (Figure 22).

Figure 22: VHA corporate data warehouse visual architecture (Bala et al., 2009, p.138)

60


• A paper published by Stolba, Banek and Tjoa (2006) discusses the

implementation of the federated data warehouse model supporting

evidenced based medicine. In this paper the authors are primarily concerned

about the security and the privacy issues of the healthcare data.

• Another paper published by Stolba and Schanner (2007), suggests a

federated data warehouse model to integrate clinical data (Figure 23).

Figure 23: Medical federated data warehouse model (Stolba & Schanner, 2007, p. 5)

As mentioned by Stolba and Schanner (2007), in this model domains such as

medical treatment, social insurance and pharmaceutical participate in one

federation while some others communicate through web services and some

may transfer data directly to the federation.

• Zhou et al (2010), describe implementation of a data warehouse for traditional

Chinese medicine for clinical and research purposes. This data rehouse model

is similar to centralised warehouse architecture.

61


Figure 24: CDW architecture for traditional Chinese medicine (Zhou et al., 2010, p. 141)

Examination of the data warehousing implementation examples in the

healthcare sector shows there is no one exact data warehouse model applicable for all

healthcare. The selection of a specific data warehouse model may depend on many

selection factors as those discussed by Ariyachandra and Watson (2010). Also, when

considering some of the examples it can be seen that the organisations do not

necessarily perpetuate a unique data warehouse model and the data warehouse model

may change to provide maximum benefit. This can be seen from the first and the

second examples. For instance, the CMS federal agency in USA developed data

marts and recently they have been planning to develop an enterprise wide data

warehouse model to integrate data from different sources. Also, VHA in the USA

have an enterprise data warehouse model and recently some authors suggest an

extension to this warehouse by introducing an operational data store (ODS) and a

web-based safety net interface and hybrid communication functionalities to improve

efficiency in the event of large scale of disaster.

As mentioned before, user requirements for the TPCH cardiac surgery data

warehouse development were collected through the questionnaire and by interviews

from the end users. After analysis of the data collected from the questionnaire, it was

summarised to provide a sample of important decisions that would like to address by

the end users (as shown in Table 6). As the next step, the required tables and data

fields from the source databases such as ICU, CARPIA, e-DS and Finance were

62


identified. In consideration of the user requirements and data warehouse

implementation literature the recommended data warehouse would be a centralised

data warehouse (Figure 24). This is because in the context of this study situation,

the only requirement is integration of four institutional data repositories which are

used to help make the selected sample decisions in the cardiac surgery unit. But, this

model may not be suitable if it were required to integrate many external sources and

progress to a global solution.

The centralised data warehouse model maintains data in the central store, and it

improves the access to data integrated from the different units of the hospital when

compared with the architecture of independent data marts. According to the survey

done by Ariyachandra and Watson among 454 participants, who are involved in data

warehouse implementing process (data warehouse managers, data warehouse staff

members, information system managers and independent consultants), the majority

selected the hub and spoke data warehouse model and federated data warehouse

model requires more development time (Ariyachandra & Watson, 2005). Another

important factor is development costs and maintenance costs of the data warehouse.

According to the survey conducted by Ariyachandra and Watson across 454

participants such as data warehouse managers, data warehouse staff members,

information system managers and independent consultants, hub and spoke data

warehouse model has the highest average cost for development (around US$

2,000,000.00 - US$ 2,500,000.00 ) and also the maintenance (around US$

1,000,000.00 – US$ 1,125,000.00) (Ariyachandra & Watson, 2005). Independent

data marts, data mart bus and centralised data warehouse model development costs

were in the range of US$ 1,500,000.00 – US$ 2,000,000.00 and also, average

maintenance costs of the data marts bus data warehouse models and centralised

models were found to be in the range of US$ 750,000.00 – US$ 1,000,000.00.

From this, it can be seen that a centralised data warehouse is more cost

effective and needs less development time compared to an enterprise wide

architecture and federated architecture. However, later on a centralised model could

be extended to an enterprise wide model /hub and spoke model or federated model if

required. Figure 24 shows the proposed centralised data warehouse model for cardiac

surgery unit.

63


Figure 25: Proposed data warehouse model for the TPCH Cardiac surgery unit

5.3.2 DEVELOPMENT PROCESS

To develop the data warehouse as desired, data was required to be integrated

from four different sources: the cardiac surgery unit, clinical costing unit, ICU unit,

quality and safety unit. However, these databases have been developed and operated

independently by different units. For example the clinical costing unit already uses an

enterprise data warehouse developed for handling finance data. However, there is no

facility for users to directly access the database servers of the clinical costing unit

database as the system is an Enterprise development which is housed at State level

with restricted access. The ICU database is also restricted. Also, there are issues with

direct connections to the CARPIA and e-DS servers. Therefore for the purpose of this

study, data excerpts from CARPIA, ICU and transition II databases were saved into

three separate spread sheet files (.csv format).

Step 1- As the first step two star schemas named risk scores star schema and cost star

schema were designed for analysis (Figures 26 and 27). Then, the SAS library was

defined to store the source data for the data warehouse prototype development. The

source data were stored in the Sample TPCH source data library.

64


Step 2- As the second step, meta data were registered for the SAS source tables. All

the extracted data from the ICU, Finance, CARPIA data were loaded into the library.

SAS data integration studio registers meta data from different sources such as content

servers (HTTP server, ftp server etc.) database servers (Oracle server, SQL server,

ODBC servers, Sybase server etc.) and enterprise application servers (SAP server).

Step 3- The third step involved design of the dimension and fact tables for the data

warehouse prototype. The new fact tables and dimension tables were designed as

shown in the star schemas diagrams (Figures 26 and 27) and registered the metadata

for the tables.

Facts tables

There are two facts tables named “cost” and “risk” scores which were created

(Appendix B).

• Risk score fact table

The risk score fact table contains data from finance database, CARPIA

database and ICU database. The table consist of FLDURNUMBER (patient hospital

admission number), FLDTHECNO (theatre encounter number), SCORE (risk score

from the ICU), PREDMORT (risk scores measurement from CARPIA), OPDATE

(operation date from the Cardiac surgery unit), CAREUNITADMDATE (patients

admission date to the ICU from Cardiac surgery unit), RISKOFDEATH (from ICU

database).

• Cost fact table

The cost fact table includes data from the Transition II databases and CARPIA.

This table contains data fields such as FLDURNUMBER, FLDTHECNO, DRG

(Diagnostic Related Groups), HOSPAADATE (hospital admission date from

CARPIA), HOSDISDATE (hospital discharge date from CARPIA database),

TOTALCOST (from transition II database).

65


Dimension tables

The dimension tables are shown in Table 7 below.

Table name Description

Patient Dimension This table contains patients information such as

FLDURNUMBER,FLDFIRSTNAME, FLDSURNAME,

FLDDOB, FLDAGE etc..

Patient cases Dimension This table stores data related complications of each

patient’s case and morbidities.

DRG Dimension This table stores the DRG’s and the DRG descriptions.

ICU diagnosis Dimension This table includes the patients diagnosis information

from the ICU.

Table 7: Dimension Tables

Also, two more dimension tables were introduced as the risk score fact table

Doctor Dimension and Date Dimension. This will help to analyse data according to

different levels relevant to the different stakeholder information needs. Moreover,

more dimension tables or fact tables can be designed and added according to

evolving decisions which need to be made by the end users.

Step 4- After designing the new tables and registering the meta data, the data was

transferred to the target tables (dimension and fact tables). Before transferring the

data from the source table, data validation was conducted for CARPIA source data,

ICU data, finance data and patient data tables for key fields. For example in the

CARPIA source data table, data validations were performed for missing data on

FLDURNUMBER, OPDATE, PREDMORT data fields and Custom validation for

Bleeding complication. All the invalid data was sent to an error table for rectification

or discard and only the valid data loaded in to the CARPIA valid data table (see

target tables. Figure 28 shows the data warehouse model developed using SAS data

integration studio. Appendix B shows the screen shots of populating fact and

dimension table, and some source table (CARPIA and finance) data validation steps.

66


Figure 26: Risk score star schema

ICU DIAGNOSIS DIMENSION

CAREUNITADMID(PK) CAREUNITID IMMUNEDISEASE DIAGNOSTICSYSTEM DIAGNOSTICCODE DIAGNOSTICTEXT PRINCIPLE PROCEDURE DIAGNOSIS SEQUENCE ICD_LONG_DESC PRINCIPAL_SECONDARY POST OP COMPLICATION

PATIENT DIMENSION FLDURNUMBER (PK) FLDSURNAME FLDFIRSTNAME FLDMIDDLENAME FLDDOB FLDAGE FLDGENDER FLDMEDICARE FLDMEDICARENUM FLDDECEASED ..............

PATIENT CASES DIMENSION FLDTHENCNO (PK) FLDSXCATCABG FLDSXCATAV FLDSXCATMV FLDSXCATTV FLDSXCATPV FLDSXCATAW FLDSXCATMISC FLDDATE_REVIEW FLDMORB_STATUS FLDBLEEDING FLDDYSFUNCTION ............

RISK SCORE FACT TABLE FLDURNUMBER (FK) FLDTHENCNO (FK) CAREUNITADMID (FK) DOCTORID (FK) DATEID(FK) CAREUNITADMDATE OPDATE PREDMORT SCORE RISKOFDEATH

DOCTOR DIMENSION DOCTORID (PK) DOCTORFIRSTNAME DOCTORSURNAME GENDER ADDRESS SPECIALTY .............................

DATE DIMENSION DATEID(PK) YEAR MONTH DATE

67


Figure 27: Cost star schema

COST FACT TABLE FLDURNUMBER (FK) FLDTHENCNO (FK) DRG (FK) HOSPAADATE HOSDISDATE TOTALCOST

PATIENT DIMENSION FLDURNUMBER(PK) FLDSURNAME FLDFIRSTNAME FLDMIDDLENAME FLDDOB FLDAGE FLDGENDER FLDMEDICARE FLDMEDICARENUM FLDDECEASED ..............

PATIENT CASES DIMENSION FLDTHENCNO(PK) FLDOPDATE FLDOPTIME FLDSXCATCABG FLDSXCATAV FLDSXCATMV FLDSXCATTV FLDSXCATPV FLDSXCATAW FLDSXCATMISC FLDDATE_REVIEW FLDMORB_STATUS FLDBLEEDING FLDDYSFUNCTION ............

DRG DIMENSION DRG(PK) DRGANDDESCRIPTION

68


Figure 28: Cardiac Surgery unit data warehouse model

Step 5 -Finally, the SAS enterprise guide analysis tool was used to configure the

data in the reporting format.

5.4 DATA ANALYSIS USING THE DATA WAREHOUSE PROTOTYPE

The SAS enterprise guide 4.2 was used to analyse the data to answer

questions 1, 2, 4 and 5. The first question addressed was question 1 from Table

6:”Comparison of risk scores – group by PREDMORT” (In the ICU risk score is

named as Score and Cardiac surgery unit risk is named as PREDMORT). The

69


following figure (Figure 29) shows an information product for the analysis results;

for the comparison of risk scores from the cardiac surgery unit and the ICU, grouped

by cardiac surgery risk score (named as PREDMORT).

Figure 29: Comparison of risk scores –group by PREDMORT

This will provide clinicians from the cardiac surgery unit and ICU with a better

understanding of the relationship between the preoperative cardiac surgery risk score

for death and significant morbidities and risk of death from ICU comparison, and

relates to the questions such as how the average risk for different clinical groups

varies after surgery and what factors are involved and how to use the risk scores to

improve performance outcomes in the cardiac surgical unit and ICU for cardiac

surgical patients. Figure 30 shows the graphical display of interaction of risk scores.

Figure 30: Interaction of risk scores

70


Figure 31 shows an example of an information product from the prototype data

warehouse to support decision-making processes based on Q2 from Table 6: “ The

actual expenditure (AU$) per episode of care according to certain clinical groups: by

procedural groups”. This shows the SAS analysis report for the actual expenditure

per episode of care according to the major cardiac surgical clinical procedural groups

and the results are grouped by patient age. This result gives clinicians the ability to

understand how the total cost of an episode of care in the finance database relates to

patient groups according to the surgeons’ frame of reference that is the clinical

procedure groups used by surgeons in their clinical audit and monitoring processes.

This can then be further combined with Transition II data to compare actual costs for

these clinical groups with the State funds provided to the hospital according to the

DRG groups. The information can be further broken down according to other clinical

criteria such as age groups (as shown), or hospital post-operative morbidities

captured on the CARPIA database such as deep sternal infections, or physiological

parameters captured in the ICU database such as core body temperature variations at

admission to ICU following surgery.

Figure 31: The actual expenditure per episode of care according to the certain clinical group

71


Analysis results for question Q4: “Cost of various post operative complications

(AU$) – by bleeding morbidity group” shows (Figure 32) the summary statistics for

the cost of various post-operative complications for example grouped by post-

operative bleeding morbidity group. This will help clinicians to identify the cost

implications of clinical issues and prioritise the quality improvement process as well

as potentially evaluate cost savings from quality improvement processes resulting in

reduced high cost morbidities, thereby valuing and appropriately resourcing such

activities.

Figure 32: Cost of reoperation for bleeding as an example of post operational complications (AU$)

Analysis results for question 5: “Audit data sources to verify costings data includes high cost procedures appropriately” is shown in Figure 33. This output shows the costs associated with the DRG’s allocated according to the cardiac surgery unit admission status. Further analysis of this can contribute to evaluation of appropriate funding structures for institutions according to surgery status performed.

72


Figure 33: Costs associated with the DRG’s- according to cardiac

surgery unit admission status (AU$)

Figure 33: (continued) Costs associated with the DRG’s- according to

cardiac surgery unit admission status (AU$)

73


Figure 33: (continued) Costs associated with the DRG’s- according to

cardiac surgery unit admission status (AU$)

Limitations and constraints in the data extraction and data analysis process

must be considered in the interpretation of these information products and include:

1. Extracted data from the CARPIA, ICU, Transition II are limited to a sample

of year 2009.

2. Did not address the question 3 (“What is the rate of e-discharge summaries

send to GP’s according to clinical guidelines for the cardiac surgical patients

according to operative data, surgical consultant?”) because of the technical

difficulties experienced in directly connecting to the e-DS database and time

limitation of the research project.

3. All the extracted data are restricted to the Cardiac surgery unit patients.

4. When comparing the risk scores from CARPIA and ICU, a small number of

ICU patient’s data from patients who returned back to ICU on the same day

were excluded.

5. Data analysis is limited to 1000 patient records due to the study’s time

constraints.

74


In general, gain from the benefits of data warehousing may take some time, for

example, changes resulting from mismatch of State funding compared to actual costs

for certain procedure groups may require further analysis and reporting to further

stakeholders to facilitate change to the costing structure. On the other hand, some

benefits can have a more rapid local effect, such as recognition by clinicians of the

differential costs for various morbidities and implications for selection of quality

improvement activities. In this research project, the data warehouse prototype was

evaluated by collecting the feedback from the end users after reporting and

explaining the analysis results.

5.5 DATA WAREHOUSE PROTOTYPE EVALUATION

According to Welbrock (1998, p. 1), the “majority of data warehouse

implementations are never monitored for their success”. Welbrock (1998) states that

“the measurement of the success of the data warehouse is outside the experience of

information technology specialists”. As stated by Welbrock (1998) this is because,

the data warehousing process is largely a business when compared to technological

exercise. Also, according to a white paper published by Threshold Consulting

services, there is a difficulty in choosing success metrics for data warehouses;

however, return on investment (ROI) is mostly used to measure data warehouse

success (Threshold Consulting Services, 2005). Moreover, some other data

warehouse success indicators that can be used to measure success are usage

measurement, customer satisfaction, availability, performance and response time.

(Threshold Consulting Services, 2005).

In this research project feedback was collected from end users to evaluate the

potential importance of the data warehouse information products for the Cardiac

surgery unit. The feedback was collected from data managers, unit managers and

clinicians via short structured interview based on the question “How does the

information product support the decision-making process at the clinical service

level?”. According to the data managers point of view, “implementation of a data

warehouse will reduce the time required in producing the reports compared to the

current process”. Also, “it provides a better way of getting the complete picture of

patient groups over a variety of important service aspects for service planning”.

Moreover, they have highlighted some issues anticipated with the actual

75


implementation of such a data warehouse. The issues that are of concern are data

access rights, security and data quality. Also mentioned were the importance of

introducing policies and data stewardship.

According to one of the Cardiac surgery Registry Coordinator (unit managers)

view, “information products generated from the data warehouse prototype are

valuable and useful in data analysis used for specific issues or problems defined by

the clinicians”. As an example given by the unit manager, the costs associated with

re-operation for bleeding output (Figure 31) is a very useful information product.

Also, unit managers mentioned that “this output could be used as part of a report on

post operative bleeding to build a complete picture for the clinicians, of the clinical

factors contributing to representation for bleeding and the full consequences of this

issue for service management”. It was also mentioned that in comparison to the

previous process of acquiring this costing data, the data warehouse made the data

more readily available to filter according to the clinical dimensions held in CARPIA,

so a full analysis of the implications of the clinical guidelines and patient

management regarding this issue was more easily facilitated.

The feedback from the Clinician from the Cardio-Thoracic surgery unit agreed

that, “developing a data warehouse is very valuable for the clinicians”. Furthermore,

he mentioned “the output shows (Figure 32) the costs involved relating to clinical

variances impact on patient management which informs the selection of a variety of

forms of management available for patients of the clinical service”.

Chapter 6: Discussion 77

Chapter 6: Discussion

Data warehousing technology predominantly aims to structure the data in a

summarised way which supports improved access to and use of the data in an

efficient and effective manner. Integrating healthcare services with such new

technology paves the way to obtain a number of benefits including improved access

to data, support of evidence-based decision-making and ultimately support of quality

healthcare services. Therefore, data warehouses can be considered a useful tool for

the support of strategic and tactical decision-making in healthcare.

To determine how data warehousing might practically contribute to improved

decision-making this study firstly examined the current data driven decision-making

process in the clinical environment. The research questions that addressed this is

“What decision-making issues exist or are faced by healthcare professionals with the

current information systems?”. This considers what issues currently exist in

information driven decision-making and whether a data warehouse may contribute to

overcoming these in the study environment. Analysis of the survey responses showed

that the data manipulation in the current decision-making process at TPCH is mostly

a repetitive manual process. For complex clinical or management questions requiring

data beyond that available from the Cardiac Surgical Registry (CARPIA), the

clinician or unit managers collect data separately by contacting data custodians and

individually integrating and assembling the data for analysis through laborious and

time consuming manual linking processes. Also, questionnaire responses indicated

that support for the capability of the present information systems to fully support

current decision-making needs is very low. This can be seen by referring to figure 18,

current support from the IS’s for decision-making. The issues which were

specifically identified as being amenable to improvement by data warehousing were

the integration data from other sources or availability of access to other relevant

clinical or administrative repositories. This was shown in the result to the question in

current decision-making issues in Figure 19. Also, other major issues that were

highlighted by respondents in the questionnaire were medical record data quality and

78

78 Chapter 6: Discussion

availability, lack of efficient reporting tools and lack of time and resources to

undertake analysis.

The data quality issue most indicated was the lack in data completeness

followed by lack in data accuracy and lack in compatibility. For the decision-making

process it is important to have complete records of clinical data. As stated by

Chapman (2005), incomplete data does not support comprehensive analysis of data

and may lead to poor or incorrect conclusions. According to Botsis, Hartvigsen, Chen

and Weng (2010), data inconsistencies are caused from uncoordinated or redundant

data entries. On the other hand, data accuracy is an important factor to consider

because false or incorrect data can potentially lead to medical errors (Connecting for

Health Common Framework, 2006). It has the potential to cause errors in

management decision-making and result in avoidable financial and quality costs to

the hospitals. However, data warehouse cannot address all the data quality issues. As

stated by Singh and Singh (2010) data quality problems may also occur in the phases

of the data warehouse development. For example, during the ETL phase where data

cleansing is taking place data quality issues can be occurred due to programs written

for extraction, transformation and load functions (Singh & Singh, 2010). However,

data quality tools such as Data Flux, Trillium Software, WizSoft etc. can be used to

improved data quality.

In this research the next main research question addressed was “How might

decision-making be improved within healthcare services by implementing a more

aligned data warehousing model or models?”. According to the literature, many

factors lead to the selection of a specific data warehouse model. However, by

reviewing the literature it can be seen that there is no universal data warehouse model

suitable for all healthcare services. This can be seen from the variety of models

demonstrated in the data warehouse implementation examples. On the other hand,

data warehouses implementations were not constrained to be fixed as one model.

Sometimes organisations changed or added extensions to the original data warehouse

model to gain maximum benefits. This can be seen from the CMS and VHA

examples (Bala et al., 2009; Winter, 2007). For instance, CMS in the USA initially

implemented several data marts however later they required an enterprise wide data

warehouse model to integrate data from different sources. Also, the VHA in USA

had already developed a corporate data warehouse, to provide support for many

79


clinical concerns such as obesity, diabetes, depression etc.. More recently, an

extension was introduced to this data warehouse by introducing an operational data

store (ODS) and web-based safety net interface and hybrid communication

functionalities. Also, from the literature it can be seen that organisations select or

suggest enterprise data warehouse models or federated data warehouse models when

there is a need of enterprise level data integration for the organisation.

The user requirements analysis is one of the main phases of the data warehouse

development process. As stated by List, Schiefer, & Tjoa (2000), the user

requirements analysis phase helps to identify the user needs for data warehouse

development. Also, this phase plays an important role in defining data staging

designs, data warehouse systems architecture, training course plans, data warehouse

system maintenance and upgrade (Golfarelli & Rizzi, 2009). However, there are

many reasons that cause this phase to deliver ambiguous, incomplete and short lived

requirements such as, some projects are long time projects and it is difficult to collect

every requirement, some decisions are poorly shared across the organisation and

decision processes may vary when time goes on (Golfarelli & Rizzi, 2009). By

integrating literature and case studies together with user requirements gleaned from

the stakeholder survey responses (Table 6), it was determined that the centralised

data warehouse model for the cardiac surgery unit would be most suitable at this

stage. The centralised data warehouse model improves the access to data integrated

from the different units of the hospital when compared with the architecture of

independent data marts. Also, according to the literature it is clear that compared to

the federated data warehouse model and hub and spoke data warehouse model, the

centralised warehouse model development and maintenance costs are very low.

In this research project to develop the cardiac surgery data warehouse

prototype, the types of decisions/analysis made by the end users from the

questionnaire and interview responses were firstly summarized and five specific

decisions/problems selected to formulate the data warehouse specifications.

However, focus on a few decision points is not sufficient to determine the data

warehouse model. Therefore, the literature and case studies related to healthcare data

warehouse development were reviewed. Then the tables and data fields related to the

five decisions/problems were identified. The two star schemas were designed to

analyse risk scores and costs (Figure 26 and Figure 27).

80


The SAS data integration studio 4.2 was used to develop the data warehouse

prototype. This software tool is easy to use, collaborative and helps to integrate data

faster and more effectively. However, high level of expertise and knowledge is

recommended when using the software when actual implementation of the data

warehouse is considered. This can be seen from the technical issues have to be faced

such as software configuration, direct data integration from CARPIA and e-DS, data

transformation issues. Therefore, the researcher had to contact SAS technical support

to solve some of the problems.

Although development of a data warehouse is a time consuming process,

because of the complexity of the clinical information, it provides an effective way to

handle and use data from several disparate units. It integrates data from different

sources and improves access to the financial and clinical information. This can be

seen from how the end users make decisions currently with the information from the

other units. Also, output reports created from the data warehouse prototype shows

how outputs will help in end users decision-making in clinical services. For instance,

“actual expenditure for episode of care” output provides opportunity for clinicians to

understand how the total cost of an episode of care in the finance database relates to

patient groups according on the surgeons frame of reference such as clinical

procedure group. Also, when developing a data warehouse, an ETL (extract,

transform and load) process helps to identify data quality issues and support a data

improvement strategy. For example as stated by Albert et al (2004), they have

implemented a project oriented data warehouse to supply data for online computing

of the Variable Live Adjusted Displays (VLADs). The purpose of this data

warehouse is to avoid incomplete or inaccurate data for VLADs (Albert et al., 2004).

Development of data warehousing will improve quality and safety monitoring

and help with better clinical care. For instance, output of comparison of risk scores

(Figure 29) from ICU and cardiac surgery will assist clinicians to improve

performance outcomes in the cardiac surgery unit and ICU for cardiac surgical

patients. Another example is, the result output generated for the cost of various

operational complications (according to the bleeding morbidity) help clinicians to

identify the cost implications of clinical issues and prioritise the quality improvement

process (Figure 32). Furthermore, integrating data repositories provides data for

clinical effectiveness and evaluation research. For instance, summary statistics for the

81


actual expenditure for episode of care output help clinicians to understand how the

total cost of episode of care in the finance database relates to patient groups

according on the surgeons’ frame of reference such as procedure groups. This

information is further broken down by age group to gain a clearer picture of analysis.

Thereby, developing a data warehouse will maximise the usefulness of data with

greater efficiency and help to answer more complex questions about patient

management and efficient health service management.

Other than this, all the end users pointed out the importance of the data

warehouse for the decision-making process when compared to the current process as

a valid and practical proposition. However, that the data managers addressed the

issues related to data quality, access rights and security in the information product

evaluation interview is worth noting, and the significance of which has been found by

other researchers such as Winter and deMul. But, in the survey majority of

respondents mentioned that, there are no concerns of data security and information

privacy should incorporate to the data warehouse development. This may be due to

the lack of understanding of data warehouse implementation by end users or they did

not understand the question properly. However, increased data exchange brings the

issue of confidentiality and access to the fore and this is well recognised. According

to Clifton (2004), “a comprehensive framework that handles the fundamental

problems underlying privacy preserving data integration and sharing is necessary”.

Also data quality is again identified as being a critical factor in the use of integrated

information and while data warehouse can provide some support for data quality

improvement, there is further organisation change management that needs to occur to

effectively address this. There are many limitations and difficulties experienced in

the development of this project as presented in sub section 6.1.

6.1 LIMITATIONS OF THE STUDY

There are number of limitations to this study of the development of a service

level clinical data warehouse prototype. Firstly, because of the time limitation and

complexity of clinical decision-making processes the scope of the project is focused

on the few selected decision points identified to inform the user requirements

analysis for this data warehouse prototype. The researcher only selected five

important decisions points made by the end users (clinicians, unit managers) when

82


analysing the questionnaire and finally only four questions were fully addressed in

the results due to time constraint and technical issues. Secondly, design of the

questionnaire was only limited to identifying the current decision making process,

current decision making issues in CIS and data warehouse prototype design

requirements. Thirdly, the researcher only focused on the decisions that can be made

integrating data from the limited data repositories identified such as CARPIA, e-DS,

Transition II and ICU. This was a result of the constraints of a short time frame to

study this topic. The selected data repositories represented the major external data

repositories identified by key staff members that contribute to more complex

decision-making. Therefore, this data warehouse prototype is a sample and does not

represent all the data required for comprehensive decision-making or may not help to

answer all the questions which may be asked. There are several other important data

repositories including those of the Main Operating Theatre, Anaesthetics and the

Cardiology Medical unit that would provide valuable data for integration, but were

not considered in the scope of this project. Fourthly, all the data from the different

sources were loaded as external files into SAS data integration studio. This is

because there is no direct access available to transition II and ICU database from the

cardiac surgery unit system.

The SAS data integration studio 4.2 software was used to develop the data

warehouse prototype while SAS enterprise guide was used to analyse data. There are

some constraints in this software such as enabling views of the database diagram

from SAS data integration studio. This is because it is needed to install SAS

Information map Studio application where the proprietary licensing for this

application is constrained. Therefore, as the fifth limitation, the actual database

relationship diagram could not be presented from the SAS application, however this

did not limit the actual development process as the diagrams presented in Microsoft

Word format. Sixthly, a rudimentary approach was selected to evaluate the data

warehouse prototype. This is because, the available methodologies such as RCT’s,

qualitative methods and ROI’s used to evaluate the benefits of a data warehouse may

be time consuming, costly and complex.

There were also a number of technical issues related to the server

configurations encountered when attempting to directly connect to the CARPIA and

e-DS databases through SAS data integration studio as the software application is a

83


non-supported application for Queensland Health. Therefore, remote access outside

of the Queensland health Standard Operating Environment had to be established

which created some burden for the information technology support services. These

difficulties might be policy level issues worth considering by the data warehouse

development process, however this is beyond the scope of this study. Some of these

technical limitations are the result of this study being an implementation at service

level rather than at enterprise level, as is more common and which would have access

to greater IT support resources. These issues will need to be considered by both those

wishing to implement a service level data warehouse and those supporting hospital IT

infrastructure.

Finally, because of Queensland Health information related policies and

confidentiality requirements of TPCH patient data, the researcher cannot give screens

shots of actual fact or dimension data tables and analysis results shown are

approximations of what actual data product results would be. Also, analysis of the

data using SAS Enterprise Guide 4.2 only used a sample of 1000 records due to time

limitations for processing.

Chapter 7:Conclusion 85

Chapter 7: Conclusion

The literature on data warehousing provides detail on contemporary data

warehousing theory and practice. Data warehouse helps to integrate data from

disparate systems. Data warehouses are distinct from operational systems in many

ways. Many differences have been described elsewhere such as in the use of data,

users, database sizes, transaction type, data entry when compared with the OLPT

systems. There are many different architectural types such as centralised data

warehouse architecture, independent data marts (IDM), federated architecture (FED),

hub and spoke and data marts bus architecture which can be identified. With regard

to selection of the data warehouse architecture, it has been found that there are many

contributing factors and it is important to consider these factors when implementing a

data warehouse.

This research identified that the current decision-making process at the cardiac

surgery unit with the other units is a manual decision-making process. Also, there are

several issues in the decision-making process at the Prince Charles Hospital.

Difficulty of integrating data from other data repositories was identified as a major

issue. Also, other issues that were highlighted by respondents were medical record

data quality and availability, lack of efficient reporting tools and lack of time and

resources to undertake analysis. Moreover, the main data quality issues that were

identified are lack in data completeness, lack in data accuracy and lack in

compatibility. Research suggests that implementing centralised data warehouse will

minimise the issues faced in the current decision-making process and also, provide

many benefits such as improved access to data, improved quality and safety

monitoring, provide data for clinical effectiveness and evaluation research and

improved decision-making.

A number of limitations existed during this project. Because of the time

limitation the scope of the project only considered the integrating four data

repositories CARPIA, e-DS, Transition II and ICU. Also, only five decisions were

selected from the questionnaire responses as user requirements time constraint.

Furthermore, data from databases loaded as external files, because of difficulty of

86

86

integrating data directly from the databases and some technical issues encountered

when try to connect to the database e-DS and CARPIA. Data analysis is limited to

analysing 1000 records due to processing time. Also, actual fact tables or dimension

tables are not provided due to Queensland Health information related polices and the

confidentiality requirements of TPCH patient data.

7.1 RECOMMENDATIONS AND FUTURE DIRECTIONS

When looking at the current decision-making issues that a data warehouse

might provide some solutions decision-making process. Therefore, based on this, it is

recommended that a warehouse might contribute to resolving some of the issues

raised in the survey of this research. For instance, development of data warehouse

provides better accessibility and it integrates disparate data sources and improves

decision-making. However, it is important to investigate selection of software for

data warehousing (which is not performed in this research). Furthermore, it is

important to address barriers to warehouse implementation at service level for

example policy, technical support for application in SOE (Standard Operating

Environment). There is also a need to address data quality and data access/privacy

issues together with warehouse implementation. With the above mentioned

recommendations the development of a centralised data warehouse in future will

provide many benefits for the cardiac surgery unit.

87

Bibliography 87

Bibliography

Agosta, L. (2005). Hub-and-Spoke Architecture Favored. Information Management Magazine. Retrieved from http://www.information-management.com/issues/20050301/1021501-1.html

Albert, A. A., Walter, J. A., Arnrich, B., Hassanein, W., Rosendahl, U. P., Bauer, S.,

Ennker, J. (2004). On-line variable live-adjusted displays with internal and external risk-adjusted mortalities. A valuable method for benchmarking and early detection of unfavourable trends in cardiac surgery. European Journal of Cardio-Thoracic Surgery, 25(3), 312-319.

Arigon, A.M., Miquel, M., & Tchounikine, A. (2007). Multimedia data warehouses:

a multiversion model and a medical application. Multimedia Tools and Applications, 35(1), 91-108.

Ariyachandra, T., & Watson, H. (2005). Data warehouse architectures: factors in the

selection, decision and success of the architectures. Retrieved May 24, 2010, from http://www.terry.uga.edu/~hwatson/DW_Architecture_Report.pdf

Ariyachandra, T., & Watson, H. (2010). Key organizational factors in data warehouse

architecture selection. Decision Support Systems. Retrieved April 24, 2010, from Scopus database.

Bala, H., Venkatesh, V., Venkatraman, S., Bates, J., & Brown, S. H. (2009). Disaster

response in healthcare: A design extension for enterprise data warehouse. Communication of the ACM. 52(1), 136-140. Retrieved April 21, 2010 from ACM Portal database.

Berndt, D. J., Fisher, J. W., Hevner, A. R., & Studnicki, J. (2001). Healthcare data

warehousing and quality assurance. Computer, 34(12), 56-65. Retrieved May 3, 2010 from IEEE Xplore digital library database.

Bonifati, A., Cattaneo, F., Ceri, S., Fuggetta, A., & Paraboschi, S. (2001). Designing

data marts for data warehouses. ACM Transaction Software Enineering Methodology., 10(4), 452-483. Retrieved May 3, 2010 from ACM Portal database.

Borysowich, C. (2007, 2010). Better Data Warehouse Modelling. Retrieved May 24,

2009, from http://it.toolbox.com/blogs/enterprise-solutions/better-data-warehouse-modelling-20835

Chapman, A. D. (2005). Principles of Data Quality (Vol. Version 1.0): Global

Biodiversity Information Facility, Copenhagen. Retrieved May 24, 2009, from http://www2.gbif.org/DataQuality.pdf.

http://www.information-management.com/issues/20050301/1021501-1.html

http://www.information-management.com/issues/20050301/1021501-1.html

http://it.toolbox.com/blogs/enterprise-solutions/better-data-warehouse-modelling-20835

http://it.toolbox.com/blogs/enterprise-solutions/better-data-warehouse-modelling-20835

88

88 Bibliography

Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. SIGMOD Record., 26(1), 65-74. Retrieved April 21, 2010 from ACM Portal database.

Clifton, C., Doan, A., Elmagarmid, A., Kantarcioglu, M., Schadow, G., Suciu, D., et

al. (Producer). (2004) Privacy preserving data integration and sharing. retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.8127

Connecting for Health Common Framework. (2006). Background issues on data

quality. Retrieved June 16, 2010, from http://www.policyarchive.org/handle/10207/bitstreams/15515.pdf

de Mul, M., Alons, P., van der Velde, P., Konings, I., Bakker, J., & Hazelzet, J.

(2010). Development of a clinical data warehouse from an intensive care clinical information system. Computer Methods and Programs in Biomedicine, In Press, Corrected Proof. Retrieved August 2, 2010 from ScienceDirect database.

del Hoyo-Barbolla, E., & Lees, D. (2002). The use of data warehouses in the

healthcare sector. Health Informatics Journal, 8(1), 43-46. Retrieved August 2, 2010 from http://jhi.sagepub.com/cgi/content/abstract/8/1/43

Delgado, M. (2011). The Evolution of Health Care IT: Are Current U.S. Privacy

Policies Ready for the Clouds? Paper presented at the IEEE World Congress on Services (SERVICES), 2011. Retrieved August 20, 2011 from IEEE computer society database.

Denton, T. A., Chaux, A., & Matloff, J. M. (1995). A Cardiothoracic Surgery

information system for the next century: Implications for managed care. The Annals of Thoracic Surgery, 59(2), 486-493. Retrieved August 2, 2010 from ScienceDirect database.

Dias, M. M., Tait, T. C., Menolli, A. L. A., & Pacheco, R. C. S. (2008). Data

warehouse architecture through viewpoint of information system architecture. Retrieved August 2, 2010 from IEEE computer society database.

Embarcadero Technologies. (2010). Healthcare Data Management Survey Report.

San Francisco: Embarcadero Technologies. Retrieved February 3, 2011, from http://www.embarcadero.com/images/dm/healthcare-it-survey-report-2010.pdf.

ExecutionMih. (2010). Dimentional model schemas -Star, Snow-flake, Constellation

Retrieved May 3, 2010, from http://www.executionmih.com/data-warehouse/star-snowflake-schema.php

Federal Student Aid. (2007). Enterprise Data Management-Data Governance Plan.

Retrieved from

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.92.8127

http://jhi.sagepub.com/cgi/content/abstract/8/1/43

http://www.embarcadero.com/images/dm/healthcare-it-survey-report-2010.pdf

http://www.embarcadero.com/images/dm/healthcare-it-survey-report-2010.pdf

http://www.executionmih.com/data-warehouse/star-snowflake-schema.php

http://www.executionmih.com/data-warehouse/star-snowflake-schema.php

89

Bibliography 89

http://federalstudentaid.ed.gov/static/gw/docs/ciolibrary/ECONOPS_Docs/DataGovernancePlan.pdf

Golfarelli, M., & Rizzi, S. (2009). Data warehouse design: Modern principles and

Methodologies. New York: McGraw-Hill Companies. Grimson, J., Grimson, W., & Hasselbring, W. (2000). The SI challenge in health

care. Communication of the ACM, 43(6), 48-55. Retrieved April 21, 2010 from ACM Portal database.

Holland, M. (2009). The future of business and clinical intelligence in the U.S.

provider market: Health Industry Insights. Retrieved from http://www-935.ibm.com/services/au/gbs/bus/html/healthcare/presentations/downloads/the_future_of_business_clinical.pdf

Inmon, B. (1999). Data mart does not equal data warehouse. Retrieved from

http://www.dmreview.com/dmdirect/19991120/1675-1.html Inmon, W. H. (2005). Building the data warehouse: Wiley Publishing

Inc.,Indianapolis. Isken, M. W., Littig, S. J., & West, M. (2001). A data mart for operations analysis.

Journal of healthcare information management, 15(2). Retrived from Google Scholar http://www.himss.org/content/files/ambulatorydocs/DataMartForOperationsAnalysis.pdf.

Jani, A. B., Davis, L. W., & Fox, T. H. (2007). Integration of databases for

radiotherapy outcomes analyses. Journal of the American College of Radiology, 4(11), 825-831. Retrieved August 2, 2010 from Science Direct database.

Johns, M. L. (2002). Information Management for health professions (Second edition

ed.): Delmar Thomson Learning Inc. Kadlec, J. (2005). SQL Server OLTP vs. data warehouse performance tuning.

Retrieved from http://searchsqlserver.techtarget.com/tip/SQL-Server-OLTP-vs-data-warehouse-performance-tuning

Kerkri, E. M., Quantin, C., Allaert, F. A., Cottin, Y., Charve, P., Jouanot, F.,

Yétongnon, K., (2001). An approach for integrating heterogeneous information sources in a medical data warehouse. Journal of Medical Systems, 25(3), 167-176. Retrieved August 2, 2010 from Springer database.

Kerr, K., Norris, T., & Stockdale, R. (2007). Data quality information and decision-

making: A healthcare case study. Paper presented at the 18th Australasian conference on Information systems.



http://www.dmreview.com/dmdirect/19991120/1675-1.html

http://searchsqlserver.techtarget.com/tip/SQL-Server-OLTP-vs-data-warehouse-performance-tuning

http://searchsqlserver.techtarget.com/tip/SQL-Server-OLTP-vs-data-warehouse-performance-tuning

90

90 Bibliography

Kimball, R., & Ross, M. (2002). The Data Warehouse Toolkit (2nd Edition ed.). Toronto: John Wiley and Sons, Inc.

Landrum, W. H., Peachey, T., Huscroft, J. R., & Hall, D. (2008). Research in

healthcare DSS: Where do we go from here? Paper presented at the Americas Conference on Information Systems (AMCIS). Retrieved May 22, 2010 from http://aisel.aisnet.org/amcis2008/358

Leitheiser, R. L. (2001). Data quality in health care data warehouse environments.

Paper presented at the 34th International conference in system science, Hawaii. Retrieved August 2, 2010 from IEEE computer society database.

Lenz, R., & Reichert, M. (2007). IT support for healthcare processes - premises,

challenges, perspectives. Data & Knowledge Engineering, 61(1), 39-58. Retrieved August 2, 2010 from ScienceDirect database.

Lindsey, K., & Frolick, M. N. (2003). Critical factors for data warehouse failure.

Business Intelligence Journal, 8(1). List, B., Bruckner, R., Machaczek, K., & Schiefer, J. (2002). A comparison of data

warehouse development methodologies case study of the process warehouse. In A. Hameurlain, R. Cicchetti & R. Traunmüller (Eds.), Database and Expert Systems Applications (Vol. 2453, pp. 203-215): Springer Berlin / Heidelberg.

List, B., Schiefer, J., & Tjoa, A. (2000). Process-oriented requirement analysis

supporting the data warehouse design process a use case driven approach (pp. 593-603).

Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., & Tarczy-Hornoch, P. (2007).

Data integration and genomic medicine. Journal of Biomedical Informatics, 40(1), 5-16. Retrieved August 2, 2010 from ScienceDirect database.

March, S. T., & Hevner, A. R. (2007). Integrated decision support systems: A data

warehousing perspective. Decision Support Systems, 43(3), 1031-1043. Retrieved August 2, 2010 from ScienceDirect database.

Marco, D. (2000). Independent Data Marts - Part 1. The Data Administration

Newsletter. Retrieved from http://www.tdan.com/view-articles/4881 Mathew, A. (2008). Asset management data warehouse data modelling. Queensland

University of Technology, Birsbane. Mohania, M., Samtani, S., Roddick, J., & Kambayashi, Y. (2007). Advances and

research directions in data-warehousing technology. Retrieved May 22, 2010 from http://dl.acs.org.au/index.php/ajis/article/view/287

Ponniah, P. (2010). Data Warehousing fundermentals for IT Professionals. Retrieved

from

http://aisel.aisnet.org/amcis2008/358

http://www.tdan.com/view-articles/4881

91

Bibliography 91

http://books.google.com.au/books?id=3PJTgyUIGk4C&printsec=frontcover&source=gbs_atb#v=onepage&q&f=false

Sahama, T. R., & Croll, P. R. (2007). A data warehouse architecture for clinical data

warehousing. Paper presented at the Proceedings of the fifth Australasian symposium on ACSW frontiers - Volume 68. Retreived May 22, 2010 from ACM digital library database.

Sanders, D., & Protti, D. (2008). Data Warehouses in Healthcare: Fundamental

Principles. ElectronicHealthcare, 6(3). Retrieved from June 2, 2010 from http://www.longwoods.com/content/19510

SAS Institute Inc. (2006). SAS data integration studio. Retrieved from

http://www.sas.com/technologies/dw/etl/distudio/factsheet.pdf SAS Institute Inc. (2010). SAS Enterprise Guide. Retrieved 15 September 2010,

from http://www.sas.com/technologies/bi/query_reporting/guide/index.html Scheese, R. (1998). Data warehousing as a healthcare business solution. Healthcare

Financial Management, 52(2), 56. Retreived March 22, 2010 from ProQuest database.

Sen, A., & Sinha, A. P. (2005). A comparison of data warehousing methodologies.

Commun. ACM, 48(3), 79-84. Retrieved 21 April, 2010 from ACM digital library database.

Shams, K., & Farishta, M. (2001). Data wareohusing: Toward knowledge

management. Topics in Health Information Management, 21(3), 24-32. Shcherbatykh, I., Holbrook, A., Thabane, L., & Dolovich, L. (2008). Methodologic

issues in halth informatics trials: The complexities of complex interventions. Journal of the Americal Medical Informatics Association, 15(5).

Singh, R., & Singh, K. (2010). A descriptive classification of causes of data quality

problems in data warehousing. International Journal of Computer Science, 7(3).

Stolba, N., Banek, M., & Tjoa, A. M. (2006, 20-22 April 2006). The security issue of

federated data warehouses in the area of evidence-based medicine. Paper presented at the The First International Conference on. Availability, Reliability and Security, 2006. (ARES 2006). Retrieved August 2, 2010 from IEEE computer society database.

Stolba, N., & Schanner, A. (2007). eHealth Integrator -Clinical Data Integration in

Lower Austria. Paper presented at the Third International Conference on Computational Intelligence in Medicine and Healthcare (CIMED 2007).

Sybase. (2010). New South Wales Health. Retrieved June 30th 2010, from

http://www.sybase.com.au/detail?id=1050806



http://www.sas.com/technologies/dw/etl/distudio/factsheet.pdf

http://www.sas.com/technologies/bi/query_reporting/guide/index.html

http://www.sybase.com.au/detail?id=1050806

92

92 Bibliography

Tan, R. B. N. (2006). Online analytical processing systems. Retrieved August 20,

2011 from http://www.irma-international.org/viewtitle/10720/ Threshold Consulting Services. (2005). Measuring the success of a data wareohuse.

Retrieved from http://www.thresholdcs.com/Knowledge-Base/White-Papers/Measuring-the-Success-of-a-Data-Warehouse.pdf.

Vesset, D. (2010). Worldwide Business Intelligence Tools 2009 Vendor Shares: IDC.

Retrived http://www.sas.com/news/analysts/IDC- ITools09VendorShares.pdf Wah, T. Y., & Sim, O. S. (2009). development of a data warehouse for Lymphoma

cancer diagnosis and treatment decision support. WSEAS Transactions on Information Science and Applications, 6(3). Retrieved April 28, 2010 from http://www.wseas.us/e-library/transactions/information/2009/28-906.pdf

Welbrock, P. R. (1998). Is your datawarehouse successful? developinga data

warehouse process that responds to the needs of the enterprise. Paper presented at the Annual 11th Conference NESUG' 98. Retrieved from http://www.nesug.org/proceedings/nesug98/atut/p068.pdf

Winter, R. (2007). Health care data warehousing in the government. Massachuettes:

Winter Corporation. Retrieved June 29, 2010 from http://www.wintercorp.com/WhitePapers/Health%20Care%20Data%20Warehousing%20in%20the%20Government%20v3.pdf

Yan, Z., & Jianli, G. (2005, 13-15 June 2005). A kind of data warehouse in

community healthcare service system. Paper presented at the Services Systems and Services Management, 2005. Proceedings of ICSSSM '05. 2005 International Conference on Service Systems and Service Management. Retrieved August 2, 2010 from IEEE xplore digital library database.

Zhou, X., Chen, S., Liu, B., Zhang, R., Wang, Y., Li, P., et al. (2010). Development

of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artificial Intelligence in Medicine, 48(2-3), 139-152. Retrieved August 2, 2010 from ScienceDirect database.

http://www.irma-international.org/viewtitle/10720/

http://www.thresholdcs.com/Knowledge-Base/White-Papers/Measuring-the-Success-of-a-Data-Warehouse.pdf

http://www.thresholdcs.com/Knowledge-Base/White-Papers/Measuring-the-Success-of-a-Data-Warehouse.pdf

http://www.sas.com/news/analysts/IDC-%20ITools09VendorShares.pdf

http://www.wseas.us/e-library/transactions/information/2009/28-906.pdf

http://www.nesug.org/proceedings/nesug98/atut/p068.pdf

93

Appendices

APPENDIX A: QUESTIONNAIRE

94

94 Appendices

Questionnaire 1. Which unit are you associated with?(Please tick check box)

Cardiac surgery Quality & Safety Unit ICU Clinical Costings Unit

Other………………………………….(Please specify)

2. What is your designation at The Prince Charles Hospital? (Please tick the check box)

Clinician Unit manager/director Data Manager/ Information analyst/ Informatician Other....................... (Please specify)

Current data repositories:

3. Do you use data from repositories outside of your own service units (Cardiac surgery, ICU, Quality& Safety, Clinical Costings) to assist decision-making in service management needs?(Please tick one box)

Yes (If yes please go to question 3.1) No (If No please go to question 3.2)

3.1 Select which data repositories you use to assist decision-making (You can select more than one answer) ICU database Cardiac surgical (CARPIA) e- Discharge summary Clinical Costings(Transition II)

3.2 Select which data repositories would you like to use to assist decision-making (You can select more than one answer) ICU database Cardiac surgical (CARPIA) e- Discharge summary Clinical Costings (Transition II)

95

Decision-making process: 4. How do you collect or access data from the listed data repositories such as ICU unit/

Cardiac surgical/e-discharge summary unit, Transition II for service management needs? ICU unit data repository

1. Direct data access or integration from ICU data repository 2. Contact data custodian to collect specific data from ICU unit 3. Contact IT department to collect specific data 4. Other Please specify.............................................................

5. Don’t use this data repository for decision-making

Cardiac surgery (CARPIA) unit data repository 1. Direct data access or integration from Cardiac Surgery data repository 2. Contact data custodian to collect specific data from ICU unit 3. Contact IT department to collect specific data 4. Other

Please specify............................................................. 5. Don’t use this data repository for decision-making

Quality & Safety eDS Summary data repository

1. Direct data access or integration from eDS data repository 2. Contact data custodian to collect specific data from Quality & Safety unit 3. Contact IT department to collect specific data 4. Other 5. Please specify............................................................. 6. Don’t use this data repository for decision-making

Clinical Costings Transition II (Finance) data repository

1. Direct data access or integration from Clinical Costings data repository 2. Contact data custodian to collect specific data from Clinical Costings unit 3. Contact IT department to collect specific data 4. Other 5. Please specify............................................................. 6. Don’t use this data repository for decision-making

96

96 Appendices

5. Identify example management problems/decisions you address or would like to address by using the other data repositories listed? (ICU unit/ Cardiac Surgical/e-discharge summary unit, Transition II)

Data repositories (Tick the data

repositories) Problems/Decisions/Analysis I would like to address Which routine analysis do you

conduct or would like to conduct

ICU database

Cardiac surgical (CARPIA)

e- Discharge summary

Clinical Costings (Transition II)

Daily Monthly Quarterly Yearly Other ………… I don’t know

ICU database





ICU database





Appendices 97

Current Issues: 6. Are you satisfied with the support provided for decision-making processes by the

current Information Systems? (Please tick check box) Yes No

Comment:

7. What are the main information related problems you have identified in the

decision-making process supporting clinical service management in your area? (Please tick check box – You can select more than one answer)

Lack of quality data Limited accessibility and availability of data from other repositories Integration of data from other repositories Difficulty of getting historical data Lack in efficient reporting tools Lack of time or resources to undertake analysis Other (Please specify)…………………………………………………

I don’t know 8. What are the main data quality issues impacting the trust in clinical data used for

the decision-making processes in your area? (Please tick check box – You can select more than one answer)

Lack in data completeness (data not missing by record or by field values) Lack in accurate accuracy (correct data)

Lack in accurate consistency/compatability (reasonablness with other or previous data eg by definitions, format)

Lack in granularity/precision (correct detail) Lack in validity and reliability (data performs intended function within required/defined specifications)

Lack in relevance (data applicable/ helpful to task at hand) Lack in data consistency (data compatibility or reasonableness with other or previous data eg relates to definitions, formats, standards)

Lack in data timeliness (currency of data) Other (Please specify)…………………………………….. I don’t know Data Storage/ data analysis: 9. Do these data repositories (ICU unit/ Cardiac surgical/e-discharge summary

unit/Transition II) store sufficient data fields for your decision-making processes?

……………………………………………………………… 10. According to your knowledge, how long is data kept in the data repositories?

98

98 Appendices

.................................................................................................. 11. According to your knowledge, what analysis tools do you use to analyse the clinical data? …. ….……………………………………………………… 12. Do you have any concerns regarding data security and information privacy that should be incorporated in the application development? ………………………………………………………………….

99

__________________________________________________________________________ If you would agree to participate for a face to face interview, for further clarification of information requirements for data warehouse prototype development please tick the following box and provide your contact details. Yes, I would like to participate Name: …………………………………………….. Position at TPCH: …………………........................ Contact number: ……………………………….... Email: ……………………………………………. Signature ………………………… Date…………………………………….

100

100 Appendices

APPENDIX B: DESIGN OF DATA WAREHOUSE FACT AND DIMENSION TABLES

101

TPCH Cardiac Surgery DW Architecture

Sample TPCH jobs Process of extract, transfer and load

data from the source tables

Sample TPCH source data Source data tables load as external file from CARPIA, ICU and Transition II

Sample TPCH target data Includes Fact and Dimension tables

102

102 Appendices

Extract, Transform and load valid data to the dimension and fact tables Populate Cost fact table

Populate Risk score fact table

103

Populate DRG Dimension table

ICU Diagnosis Dimension

104

104 Appendices

Populate Patient cases Dimension

Populate Patient Dimension

105

Invalid data handling CARPIA data validation Missing values (URNumber, PREDMORT,OpDate) and Custom validation (Bleeding complication).

106

106 Appendices

Finance data validation Missing values (URNUM, CostTOTALACT)

IMPACT OF A DATA WAREHOUSE MODEL FOR IMPROVED DECISION ... · PDF fileimpact of a data warehouse model for improved decision-making ... improved decision-making process in healthcare

Documents

IMPACT OF A DATA WAREHOUSE MODEL FOR IMPROVED DECISION ... · PDF fileimpact of a data warehouse model for improved decision-making ... improved decision-making process in healthcare