-
Chapter 5: Databases in Healthcare
5.1 Databases and Types of Data structure
Databases are collections of data with a specific well defined
structure and PURPOSE. In hospitals databases are the spinal cord
of hospital information systems Databases in healthcare are the
collection of health data. Programs to develop & manipulate
these data are called Database Management Systems (DBMS)
ARE THESE DATABASES? An excel file with names and medication of
patients within a hospital
A nurses agenda with to dos A schedule of the shifts for next
week
A list of the medicines available The medical record of a
patient
One would say that databases are structured collections of data
so the list in this textbox is not definite, but rather the way
these data are organized.
Types of Data Structure Flat Data Hierarchical Data Relational
Data Object-oriented data NoSQL databases
All data submitted into electronic health records are most of
the times based on relational, object-oriented or, in few recent
cases, on noSQL databases.
5.1.1 Flat Files
A flat file can be a plain text file, usually containing one
record per line or it can be a binary file. The majority of the
existing software includes easy access to flat data files. For
simple data flat databases work. They waste computer storage by
requiring it to keep information on items non logically available
Flat databases are not complicated query friendly
-
5.1.2 Hierarchical Models
Data models in which the relationship between higher and lower
items are inherited. An example of an hierarchical structure:
folders-subfolders on our computers
Does this structure actually facilitate the real life health
process?
Pros: Actions on parents save time since they affect all
children Cons: In the real healthcare world most relationships are
not hierarchical.
5.1.3 The Relational Database Model
Major Elements The database is a collection of tables, which
represent entities and relationships The table name is the
relationship title Columns represent the characteristics of the
entity Rows represent data
5.2 Relational Databases
The principles of relational databases can be summarized to the
following points:
Data in a relational database are values stored in the database.
Data alone are useless. Relational databases are composed by a set
of tables Each table includes records, which are the table rows and
fields, which called table columns Fields can be of various data
types. They can be alphanumeric, numeric, date-time, Boolean etc
Using keys we access a table record. The key which uniquely
identifies a record is the primary key Index the physical mechanism
which improves the database efficiency. This is part of the
physical
structure of the database and is not at all related with keys
which are part of the logical structure. In a relational database
we call view, a virtual table composed by a sub-set of the actual
tables. In relational databases, there exist one-to-one,
one-to-many or many-to-many relationships. The term data integrity
describes the accuracy, validity and unity of the data.
-
An example of a relational database
Advantages of the Relational Schema Databases can be examined by
many different perspectives No need to enter missing information
for variables that are not logically possible Easy to modify
because adding new entities involves adding new tables and not
altering old ones (Granted that the database is adequately
normalized)
Normalization in Relational Databases Normalization is the
process where insufficiently normalized schemes are split into
smaller
schema with more desirable characteristics. With normalization,
we succeed to minimize anomalies during data entry, update and
deletion. Normalized forms provide the methodological framework to
analyze the database schema based
on the database keys and the functional dependencies.
Edgar Codd, 1923-2003: The father of normalization (have a look
at his 1971 paper)
Every characteristic belongs to the entity it characterizes.
Every characteristic only exists once in a database. Keys fully
define the records Each value of the same characteristic is stored
into the database only once Normalization-Easy to Remember
Rules
-
5.3 Database categories found in healthcare
5.3.1 Distributed Databases in healthcare Data are kept in
different settings and different computers. Since data produced are
huge, the replication and distribution of databases improves
database performance at the healthcare settings. Distributed
databases need to address the location of the data AND audit log,
that is a chronological record of the destination and source that
provide documentary evidence of the sequence of activities that
have affected at any time a specific procedure. Possible Cons Data
loss is limited to nodes affected and this is critical for
healthcare Since they are decentralized, they are more flexible and
allow different units to update and
maintain their own data 5.3.2 Large Healthcare Utilization
Databases They are used to study the use and outcome of treatments
Their huge size allow the study of rare events Since they are
representing the clinical routine care, they can address real world
effectiveness and
utilization patterns 5.3.3 BLOBS-Binary Large Object Files Very
frequent in healthcare settings Images (ct, mri) Audio (heartbeat
seq.) Video (ultrasounds) The dilemma: should we move these data to
data warehouses or keep them in their source? 5.3.4 Data-less
databases They are distributed databases which have been set-up
without any data, until such a need arises. They may be useful in
healthcare Less expensive than centralized registries (it requires
no equipment and little personnel) The use of the system does not
require vague and time-independent patient consents The system does
not require duplication of data in different databases 5.3.5 Object
Oriented Data Models They are more efficient Use of real-life
objects (entities) They use SQL Much higher programming flexibility
since there is the possibility to integrate the database with
object oriented programming languages (i.e. java, C# etc) Not
yet fully standardized but this is ongoing
-
Example of object oriented model
-
Chapter 6: A comparison between SQL and NOSQL databases
6.1 The Structured Query Language (SQL) 6.1.1 History of the SQL
Standards ISO/IEC 9075 The most famous relational DBMS! New
versions are out every couple of years. Some of the most important
updates include the following features. 1987 Initial ISO/IEC
Standard 1989 Referential Integrity 1999 SQL99
Call-level-Interface. Standardized communication of queries with
DBMS 2003 XML Functionality 2008 New expansions and update
6.1.2 SQL Characteristics Data stored in columns and tables
Relationships represented by data Data Manipulation Language and
Data Definition Language Transactions Abstraction from physical
layer SQL is independent of data-applications Applications specify
what, not how Physical layer can change without modifying
applications 6.1.2.1 Data Definition Language (DDL) Schema defined
at the start
Create Table (Col1 type1, Col2 type 2 ) Constraints to define
and enforce relationships (Primary-Foreign Key) Alter, Drop
Security and Access 6.1.2.2 Data Manipulation Language (DML) Data
manipulated with Select, Insert, Update, & Delete
statements
Select T1.Column1, T2.Column2 From Table1, Table2 Where
T1.Column1 = T2.Column1
6.1.3 Transactions ACID Properties of SQL Transactions Atomic
All of the work in a transaction completes (commit) or none of it
completes Consistent A transaction transforms the database from one
consistent state to another consistent state. Consistency is
defined in terms of constraints. Isolated Results of changes during
a transaction are not visible until the transaction is over Durable
The results of a committed transaction survive failures
-
6.2 NoSQL Databases (Not Only SQL)
6.2.1 NoSQL Definition (www.nosql-database.org) Next Generation
Databases addressing some of the points: non-relational,
distributed, open-source and horizontal scalable. The original
intention has been modern web-scale databases. The movement began
early 2009 and is growing rapidly. Often more characteristics apply
as: schema-free, easy replication, simple API, eventually
consistent / BASE (not ACID), a huge data amount, and more.
Usually referring to NoSQL we consider modern database systems
using document stores, key value stores, XML databases, graph
databases, column stores, object stores, etc.
These databases assume that data storage does not require fixed
table schemas NoSQL are those database management systems that do
not adhere to the widely used SQL
relational database management system NoSQL has become well
known with the advent of web scale data and systems by Google,
Facebook,
Amazon, Twitter etc to manage data There are over one hundred
different NoSQL databases 6.2.2 NoSQL: multiple types based on the
architecture
1. Key Value databases. These are based on a hash table of keys
2. Document based systems (i.e. mongoDB). These store documents
made up of tagged elements 3. Column family systems. Each storage
block contains data from only one column 4. Graph Databases
6.2.1.1 Column Store types of noSQL databases Each storage block
contains data from only one column. When multiple rows are inserted
in traditional raw insert methods, Column Store databases have
better performance
Hadoop/Hbase http://hadoop.apache.org/ Yahoo, Facebook Ingres
VectorWise: column Store integrated with an SQL database
http://www.ingres.com/products/vectorwise Examples of column store
noSQL databases
6.2.1.2 Document Store or Document Oriented types of noSQL
databases These assume that documents encapsulate and encode data
in standard formats or encodings
CouchDB http://couchdb.apache.org/ MongoDB
http://www.mongodb.org/
Examples of document store noSQL databases
6.2.1.3 Key-Value Store types of noSQL databases Hash tables of
KeysValues stored with Keys Fast access to small data values
MemCacheDB http://memcachedb.org/ Project-Voldemort
http://www.project-voldemort.com/
Examples of key- store noSQL databases
-
Document Store BaseX, Clusterpoint, Apache Couchbase, eXist,
Jackrabbit, Lotus Notes and IBM Lotus Domino LotusScript,
MarkLogic
Server, MongoDB, OpenLink Virtuoso, OrientDB, RavenDB, SimpleDB,
Terrastore Graph AllegroGraph, DEX, FlockDB, InfiniteGraph, Neo4j,
OpenLink Virtuoso, OrientDB, Pregel, Sones GraphDB, OWLIM Key Value
BigTable, CDB, Keyspace, LevelDB, membase, MemcacheDB, MongoDB,
OpenLink Virtuoso, Tarantool, Tokyo Cabinet,
TreapDB, Tuple space Eventuallyconsistent - Apache Cassandra,
Dynamo, Hibari, OpenLink Virtuoso, Project Voldemort, Riak
Hierarchical - GT.M, InterSystems Cach Tabular BigTable, Apache
Hadoop, Apache Hbase, Hypertable, Mnesia, OpenLink Virtuoso Object
Database - db4o, Eloquera, GemStone/S, InterSystems Cach, JADE,
NeoDatis ODB, ObjectDB, Objectivity/DB,
ObjectStore, OpenLink Virtuoso, Versant Object Database,
Wakanda, ZODB Multivalue databases - Extensible Storage Engine
(ESE/NT), jBASE, OpenQM, OpenInsight , Rocket U2, D3 Pick
database,
InterSystems Cach, InfinityDB Tuple store- Apache River,
OpenLink Virtuoso, Tarantool
Many noSQL DBMS systems are currently available
6.2.2 NoSQL Distinguishing Characteristics
Large data volumes (i.e. Googles big data) Scalable replication
and distribution
Potentially thousands of machines Potentially distributed around
the world
Queries need to return answers quickly Mostly query based, few
updates Asynchronous Inserts & Updates Schema-less Open source
development
-
6.2.3 C-A-P Theorem: you can only choose 2 out of three in
distributed databases
Breers CAP Theorem: a distributed system can support only two of
the characteristics: Consistency, Availability, Partition tolerance
Consistency: all nodes see the same data at the same time
Availability: node failures do not prevent survivors from
continuing to operate Partition Tolerance: Operations will
complete, even if individual components are unavailable 6.2.4
Storing and Modifying Data in noSQL databases Syntax varies (ie
java, html) Asynchronous - Inserts and updates do not wait for
confirmation Versioned Optimistic Concurrency: multiple
transactions can complete without affecting each other 6.2.5
Querying data in noSQL Syntax Varies
No set-based query language Procedural program languages such as
Java, C, etc
Application specifies retrieval path No query optimizer The
dogma of prioritizing speed than accuracy is here May not be a
single right answer
6.3 The debate: SQL vs noSQL
The two most significant differences between SQL and noSQL are
Scaling SQL does not allow massively parallel processing, which
lead to larger computers (scale up) vs. distribution to numerous
commodity servers, virtual machines or cloud instances (scale out).
Modeling SQL databases are highly normalized and require
pre-defined data models prior to inserting data into the system. In
contrast NOSQL databases do not require pre-defined data models.
6.3.1 SQL Positive and Negative Points
Positive
Advanced data aggregation options, statistics and reports at
data level (integrated) High performance OLTP databases Good
transaction features Complex SQL queries possible for diverse cases
A large array of tools and compatible software Data-Application
Independency
Weaknesses SQL complexity and cost for large solutions Not so
fast to develop
-
Learning curve is not low 500gb is the maximum to exist on the
server Scalability issues Performance issues Maintenance issues
6.3.2 NoSQL Positive and Negative Points
Positive Rapid development and easy to program They support
insert, delete, select functions Faster performance (compared to
SQL) and high read-write NoSQL solutions may handle huge BLOBS
NoSQL solutions may have sufficient querying possibilities Good for
constantly changing data Efficient in horizontal scalability
Weaknesses Lack of relation between one key to another No
security or authentication of users Data storage cannot be
efficiently used for analytics, aggregations and reporting
SQL Databases NoSQL Databases
Predefined Schema required Predefined Schema is not required or
does not exist
Standard definition & interface language Definition and
interface language according to the product
Great consistency Well defined semantics
Not so consistent all the times
Consistent and accurate Getting an answer quickly is more
important than getting a correct answer
Summarization of differences between SQL and noSQL
6.4 Implementations of noSQL databases in healthcare
Not so many yet Most noSQL products are still beta and largely
open source, lacking in support. Some say that medical apps are
inevitably going to be extremely conservative, because people
could die if the IT system fouls up. But this is the future
Electronic Health Records CAN do no-SQL. Some of the healthcare
characteristics of future Health Information System are expected to
support such applications. This is due to that in healthcare there
is: Semantic interoperability (3M HDD, SNOMED, LOINC, HL7, ICD 9
& 10, RxNorm, CPT, etc.),
metadata and master data management (EMPI, providers,
organizations, locations, devices, etc.) Cloud based architecture
Standardized or flexible data models Health Information Systems are
distributed and GO CLOUD! Examples of noSQL Electronic Health
Records: VistA, CHCS, AHLTA, Epic, Cerner etc.