1 Grid Dynamics – Scaling Mission-Critical Systems [email protected] December, 2009 Data is a heart of scalable system (base of “Philosophy of transaction processing”)
Dec 14, 2014
1Grid Dynamics – Scaling Mission-Critical Systems
[email protected], 2009
Data is a heart of scalable system (base of “Philosophy of transaction processing”)
Read vs. write dialectic
3Grid Dynamics – Scaling Mission-Critical Systems
Read vs write
Processing of business transaction
involves both read and write
operations. These operations impose
contradictory requirements for data
structures and architecture.
4Grid Dynamics – Scaling Mission-Critical Systems
Data schema
Normalized DenormalizedRead
Bad. Complex queries,
joins are slow
Good. Fast queries,
no joins, simple queries
WriteGood.
Non contradicting, less rows to update
Bad. Potential inconsistency, more rows to update,
complex update procedures
5Grid Dynamics – Scaling Mission-Critical Systems
Redundancy
Single copy Multiple copiesRead
Bad. Bottleneck
Good. Balancing of load between copies
WriteGood.
No consistency problems,
Single place to update
Bad. Multiple places to
update, synchronization and
consistency problems
6Grid Dynamics – Scaling Mission-Critical Systems
Storage
Goo
d fo
r re
ad
Good for write
RDBMS
Key/valueDoc oriented
File system
MQ
7Grid Dynamics – Scaling Mission-Critical Systems
Message queue as a Storage
Sending message = write operationConsuming message = read operation (with very limited semantics)
Durable subscriptionsTransaction support
MQ does not need to keep indexes, and may write transactions on disk
extremely fast
8Grid Dynamics – Scaling Mission-Critical Systems
Storage media
Magnetic disks Slow Persistent
Dynamic memory Very Fast Volatile
Flash memory – starting to change IT landscape Fast Persistent
9Grid Dynamics – Scaling Mission-Critical Systems
RDBMS are not sleeping
MQ gets integrated into the core of RDBMS
MySQL, Postgres, BerkleyDB are starting to move
From anemic storage – to processing facility
In-memory operations
Oracle TimesTen
Materialized views
Distribution
11Grid Dynamics – Scaling Mission-Critical Systems
You have to go distributed
You cannot avoid building distributed system.
Fault tolerance System should survive server failure
Scaling Resources of single server are limited
Globalization Modern business is distributed
12Grid Dynamics – Scaling Mission-Critical Systems
Network
Network is your enemy, never forget it
Network is unreliableNetwork is slowNetwork has limited bandwidth
Also network iterations require complex data format transformation – HTTP + SSL + XML may kill performance in
blink of an eye
13Grid Dynamics – Scaling Mission-Critical Systems
Network vs. disk access to dataNetwork Magnetic disk
LatencyLess (~1ms)
Seek time (~10ms) if not cached
Random access
Good, unless large number of separate
small iterations is used
Bad, high seek time
Bandwidth Limited, network infrastructure is
shared
Higher, if no seek required – very high
throughput
Data as a state of the system
15Grid Dynamics – Scaling Mission-Critical Systems
Validity of state
Valid = has an interpretation which make sense from business or operational prospective
It should look meaningful, it doesn’t matter what happens inside.
Contradiction is not a contradiction as long a we know how to resolve it
16Grid Dynamics – Scaling Mission-Critical Systems
Time is just another axis
Speed of light is limited, but it does not make
stars less beautiful, even if their light is a
thousands years old.
There is no need to force all changes instantly,
offloading operations for asynchronous and
batch processing are powerful method to
increase performance and deal with peak-
loads.
17Grid Dynamics – Scaling Mission-Critical Systems
What data do we usually have?
Transactional Data changing dynamically (either by us or 3rd
party)
Static data Data is changing not that often Can be treated as immutable for most
operations
18Grid Dynamics – Scaling Mission-Critical Systems
Operations on data
Operational data May include transactional and static data Complex read operations Low latency requirements
Transactional data Intensive write operation Low latency requirements
Secondary operations (e.g. daily reporting) Read access to entire scope of information Operations over large datasets
19Grid Dynamics – Scaling Mission-Critical Systems
Data structures
Data structures should be tailored for access pattern.
How to deal with read/write contradiction?
Use two storagesSynchronize data between storages Synchronously Asynchronously Periodically
20Grid Dynamics – Scaling Mission-Critical Systems
StoragesOperational log storage
Transactional data only Fast writes, limited read (e.g. in case of recovery only)
Operational view Transactional + static data Tailored for business logic queries Write intensive
Long term storage Every data in system Required for migration backup/restore of information Suitable for analytics and ad hoc
21Grid Dynamics – Scaling Mission-Critical Systems
Sweet spot of data grids
IMDG is very fast at simple queries IMDG has great write throughput
This makes IMDG ideal solution as a “view” of operational data.
We can tailor data structures for queries
No need for persistence
22Grid Dynamics – Scaling Mission-Critical Systems
Long term storage
RDBMS is unbeatable in this field
Complex analytic queries
Ad hoc queries
Trust in big vendors
Asynchronous synchronization works best
here
Inventing a bicycle
General transaction processing system revisited
24Grid Dynamics – Scaling Mission-Critical Systems
Transaction processing style
Synchronous processing We can return transaction acknowledge to
client, only when we can guarantee that transaction is successful and durable
Asynchronous processing We acknowledge only the fact of starting
business transaction
I both cases we have to fixate a request before acknowledge it
25Grid Dynamics – Scaling Mission-Critical Systems
A bicycle
1. Fixate incoming request (optional)
2. Acknowledge (async processing)3. Operational processing4. Fixate operation result
Operation log Operational view Data warehouse
1
4
2
5
3 6
Downstreamsystems
7
7
5. Response (sync processing)6. Update operational view7. Backend processing - updating warehouse - working with down streams
Long term storage
26Grid Dynamics – Scaling Mission-Critical Systems
Operational log
Files Backup solution required Local access only
DBMS Low transaction throughput Index management overhead
In-memory/IMDG Not durable
MQ Best fit?
27Grid Dynamics – Scaling Mission-Critical Systems
Operational view
RDBMS – possible, with some tuning
Normalized data model is efficient
Disk slow, but there are in-memory options
Key/Value DBMS – possible
Not so high write throughput
In-memory/IMDG – best fit
Limited capacity
28Grid Dynamics – Scaling Mission-Critical Systems
Long term storage
RDBMS – monopoly
Key/Value – possible
But long term storage anticipates schema
and strong consistency
29Grid Dynamics – Scaling Mission-Critical Systems
Different approach
Traditional design – one size fits all We need to design storage good for both read and
write operations. We are working against physics here
Multiple layer storage One storage optimized for write and reliability * One storage optimized for read operation Synchronization between storages
We replaced one impossible problem with 3 hard problems But it is clear how to solve each of
them and such solutions can be reused.* Only for write intensive applications
IMDG
31Grid Dynamics – Scaling Mission-Critical Systems
Weak No SQL “In-memory” is not so fast
Flash memory technology Strong believe in hardware
In-memory RDBMS (though limited to single server)
Network is slow Multiple network round trips per request may ruin
performance Bandwidth is limited
RDBMS is better with complex queries
32Grid Dynamics – Scaling Mission-Critical Systems
Strong - data
Schema should be adopted Denormalized – single lookup per
operation Data affinity is your friend
… but if you cook a right schema: True horizontal scale out Fast operations Great write throughput
And it scales!
33Grid Dynamics – Scaling Mission-Critical Systems
Strong - distribution
Addresses headaches of distributed systemsCoordination of work
Keeping cluster together Node communications Failover – (IMDG facilitates availability)
Dealing with state (data) Data bottleneck problems Data consistency – (IMDG provides
consistency)
+ Recovery from failure
34Grid Dynamics – Scaling Mission-Critical Systems
The CAP theorem
* Introduced in 2000 by Eric Brewer, formally proven by Seth Gilbert and Nancy Lynch in 2002
CConsistency
AAvailability
PPartitiontolerance
IMDG
Quorum based
systems
Eventual consistencywith conflict resolution
35Grid Dynamics – Scaling Mission-Critical Systems
Q&A