4 TB Audit Log from SQL Server to MongoDB Michael Poremba Director, Data Architecture Practice Fusion May 2015
Apr 21, 2017
4 TB Audit Log from SQL Server to MongoDB
Michael PorembaDirector, Data ArchitecturePractice Fusion
May 2015
2
+ 20 years software engineering+ Data architect / application architect+ High-volume OLTP relational databases+ Application performance and scalability+ Domain experience:
Health care; financial services; IT management; content management and distribution;targeted advertising; telecom billing; manufacturing; insurance
Michael Poremba @ Practice Fusion
3
Project BackgroundGetting started
4
+ Cloud-based electronic health records service (EHR)+ Over 100,000 health care providers in US+ Over 100,000,000 patient medical records+ SQL Server OLTP database
Weekday peak ~ 60,000 transactions per second+ Primary database = 8 TB+ 50% of primary database is security audit records + indexes
Practice Fusion
5
+ HIPAA: Health Insurance Portability and Accountability Act of 1996+ Who did what to which patient’s medical record when?+ Regulatory requirement—audit log must be kept and reviewed+ Law enforcement and evidence in legal discovery+ Save the audit log forever+ Primary use cases:
Audit report in EHR: Security audit log viewer Physician data analytics: Clinical quality measures (CQM)
HIPAA Security Audit Log
6
7
HIPAA Security Auditing on MongoDBProject anatomy & lessons learned
8
+ Latency on SAN increased+ Database writes slowed down+ Database connections held longer+ Connection pool expanded+ User interface locked up—waiting+ Users tried to log in again+ Login is heaviest user operation+ [Repeat]
The Log Jam
Found at: http://anchorhardwoods.com/wp-content/uploads/2011/08/log-jam.jpg
9
Security Auditing – Legacy Architecture
PublicLoad
Balancer
App 1
App 2
App n
.
.
.
EHR(OLTP DB)
ActivityFeed
ActivityFeedParameter
2..10
CQMReporting
ETL
AuditReport
10
Audit Service – New Architecture
PublicLoad
Balancer
App 1
App 2
App n
.
.
.
MongoDBAudit Log
AuditService AMQ Queue
Listener
AuditReport
CQMReporting
ETL
11
+ Isolate auditing system from EHR OLTP database+ Move audit IO off of EHR SAN to AWS+ New service interface for audit events using .NET+ Scale out audit service interface on IIS farm+ Scale out audit data store using MongoDB
Technical Benefits of New Architecture
12
+ Transaction volume: Sustain 1,000 new documents per second+ Data volume: Scale to 10’s of billions of audit event records+ High availability and disaster recovery—higher SLA than EHR+ Quick UI response time for interactive audit report+ Tamper prevention and detection
No updates or deletes permitted on audit log Security alerts when audit log is altered
+ Leverage industry standards for health care security audit logging ~300 distinct auditable user actions Required and varying data elements
Security Auditing – Application Requirements
13
Project Objectives+ New infrastructure for MongoDB
and AMQ+ Modernize audit service API+ Convert ~200 audit events to new
audit service interface+ Data warehouse ETL from MongoDB+ Modernize audit report UI+ Migrate 4 billion exiting audit records
Project: Audit 2.0Coletteprogram management
Ernestservices expertBhaviktest engineeringJayMongoDB expertJeffcluster architecture
Michaeldata architecture
BrettAMQ expertBryaninfrastructure coordination
Rajanidata warehouse ETL
14
AuditEvent
ParticipantObject
AuditSystem User
0..n1..1 1..2
Health Care Industry Standards for Audit Logging
+ ISO 27789:2013: Health Informatics – Audit trails for electronic health records
+ ASTM E2147-01(2013): Standard Specification for Audit Disclosure Logs for Use in Health Information Systems
+ FHIR SecurityEvent – resource definition for auditing
15
{ "_id" : <BinaryData(4)>, // The audit event GUID "docHash" : <String; Required>, // Tamper detection "audOrgGuid" : <BinaryData(4); Required>, // Shard key "crtdDttmUtc" : <Date; Required>, // Datetime record was inserted "evnt" : {// Required subdocument "dttmUtc" : <Date; Required>, // Date/time that event occurred "typ" : <String; Required>, // Event record type; ~ 300 types "ptDataTyp" : <String; Required>, // Standard set of patient data types "actn" : <String; Required>, // Standard set of actions "sys" : <String; Required> // Source system for audit event }, "usr" : { // Required subdocument "usrId" : <String; Required>, // Human-readable ID "usrGuid" : <BinaryData(4); Required>, // Machine-readable ID "dispNm" : <String; Required>, // Required; Display name for user "orgId" : <String; Required>, "orgNm" : <String; Required> }, "altUsr" : { // Optional subdocument for second user ... // Subdocument contains same properties as "usr" }, "pt" : { // Optional subdocument "ptId" : <String; Required>, // Human-readable ID for patient "ptPracGuid" : <BinaryData(4); Required>, // Machine-readable ID for patient "dispNm" : <String; Required>, // Display name for patient "orgId" : <String; Required>, "orgNm" : <String; Required> }, "body" : { // Optional subdocument ... // Flattened list of attributes, specific to audit event subtype }}
JSON Document Schema for Audit Events
AuditEvent
ParticipantObject
AuditSystem User
0..n1..1 1..2
16
Schema Design – Lessons Learned
+ Prop nms strd per doc Long names add up for large collections (ours: 1 TB) Consider using abbreviated property names Up-vote this feature request:
https://jira.mongodb.org/browse/SERVER-863
+ Know your application read/write patterns+ Application responsible for data integrity+ Be aware of data type behaviors
Indexed string search is case sensitive. Upvote:https://jira.mongodb.org/browse/SERVER-90
Several binary data types for UUID—use type 4(default type is specific to database driver)
Found at: http://www.milesfinchinnovation.com/blog/wp-content/uploads/2013
/02/iStock_000019474446Medium.jpg
17
Schema Design – Lessons Learned
Leverage native data types:+ Date+ Boolean+ Numeric
"1" + "1" "11" "11" + "1" "111"
+ UUID "8c290139-f4e3-49c1-9ba2-a883defc6a15" "8C290139-F4E3-49C1-9BA2-A883DEFC6A15" "8c29-0139-f4e3-49c1-9ba2-a883-defc-6a15" "8c290139f4e349c19ba2a883defc6a15" "{8c290139-f4e3-49c1-9ba2-a883defc6a15}" "{8C290139-F4E3-49C1-9BA2-A883DEFC6A15}"
Found at: http://www.industryweek.com/innovation/innovation-one-size-fits-one
18
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action Type PatientData Type
(~300)
(~4 billion)
(~30 billion)
(10) (18)
UserPatient
(~100,000)(~100 million)
Practice
(~50,000)
Legacy Auditing System – Relational Schema
Issues around data normalization+ New requirements introduced+ Filter criteria and sort criteria
stored in five different tables+ Audit events must be read into
memory for filtering and sorting Join and expand data set by practice Sort and filter expanded data set
+ Response time suffers for large practices with many audit events
19
Schema Design – Lessons Learned
ActivityFeed
Audit EventType
ActivityFeed
Parameter
Action Type PatientData Type
UserPatient
Practice
+ Denormalize with care:
+ {
+ "_id" : <BinaryData(4)>,
+ "docHash" : <String; Required>,
+ "audOrgGuid" : <BinaryData(4); Required>,
+ "crtdDttmUtc" : <Date; Required>,
+ "evnt" : {
+ "dttmUtc" : <Date; Required>,+ "typ" : <String; Required>,
+ "ptDataTyp" : <String; Required>,+ "actn" : <String; Required>,+ "sys" : <String; Required>
+ },
+ "usr" : {
+ "usrId" : <String; Required>,+ "usrGuid" : <BinaryData(4); Required>,
+ "dispNm" : <String; Required>,
+ "orgId" : <String; Required>,
+ "orgNm" : <String; Required>
+ },
+ "pt" : {
+ "ptId" : <String; Required>,+ "ptPracGuid" : <BinaryData(4); Required>,
+ "dispNm" : <String; Required>,
+ "orgId" : <String; Required>,
+ "orgNm" : <String; Required>
+ },
+ "body" : { ... }
+ }
20
+ Millions of audit events per medical practice+ Require fast response time for interactive audit report UI+ Audit report UI allows events to be sorted/filtered five different ways+ UI allows paging through audit events+ Create a secondary index for each sort method
Index Design
21
+ Organization, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.dttmUtc": -1} );
+ Organization, patient, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "pt.ptId": 1, "evnt.dttmUtc": -1 } );
+ Organization, user, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "usr.usrId": 1, "evnt.dttmUtc": -1 } );
+ Organization, patient data type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.ptDataTyp": 1, "evnt.dttmUtc": -1 } );
+ Organization, user action type, event date DESCdb.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.actn": 1, "evnt.dttmUtc": -1} );
+ Document created date DESCdb.auditEvent.ensureIndex ( {"crtdDttmUtc": -1 } );
Index Definitions
22
+ Filter by practice GUID+ Sort by event created date time, descending order+ Limit to 20 documents
db.auditEvent.find( {"audOrgGuid": BinData(4,"ABrlAG57Rx6gY3zyHzFK3Q==")} ).sort( {"evnt.dttmUtc" : -1} ).limit(20).explain();
{ "clusteredType" : "ParallelSort", "shards" : { "RepSet02/MNGODDB03-SHRD02:27018, MNGODDB04-SHRD02:27018" : [ { "cursor" : "BtreeCursor auditEvent_audOrgGuid_dttmUtc", ...
} ] } ... "numshards" : 1, ...
Query Plan
23
Indexing Strategy – Lessons Learned
+ As with relational databases, indexes are essential for efficient queries
+ Learn how to use .explain()to read query plans
+ Avoid collection scans"cursor" : "BasicCursor"
+ For compound indexes, query sort order must match index sort order
+ Enable mongod --notablescan option in test / staging environments
Found at: http://www.ebay.com/itm/13-pc-Hex-Shank-Titanium-Drill-Bit-Set-Quick-Change-Bits-/350526103504?pt=LH_DefaultDomain_0&hash=item519cfbdfd0
24
Principle of least privilege+ MongoDB cluster not accessible from public Internet+ Security enabled on cluster+ Application users granted minimum permissions requiredSigned audit events+ Audit events signed with hash of audit event contents+ Recompute hash on reads—test the data against hash value+ Send security alert when hash does not matchOplog monitoring+ Use mongo-connector Python scripts to monitor oplog+ Watch for .update() and .delete() operations on collection+ Send security alert when data changes are detected
Tamper Prevention and Detection
Found at: http://legacymedia.localworld.co.uk/275663/Article/images/17639732/4416792.jpg
25
Security – Lessons Learned
+ Minimize network access to MongoDB cluster
+ Enable authentication+ Leverage role-based
authorization+ Use SSL (MongoDB Enterprise)+ Disable REST interface and
HTTP status interfaceFound at: http://www.harborfreight.com/3-1-2-half-inch-circular-padlock-98972.html
26
+ Shard the database to scale out+ Begin with small number of shards (2 or 3)+ Group all audit events from the same medical practice
Every audit event is “owned” by some practice Audit report UI always queries events by medical practice
+ Composite shard key on { PracticeGuid, _id } db.runCommand({ shardcollection : "AuditLog.auditEvent", key: {audOrgGuid: 1, _id: 1}});
Transaction Volume: 1,000 New Documents per Second
Found at: http://s3.amazonaws.com/Reconsales/800/0bfe72e0-9b06-42ac-9644-5727a3ca9c79.jpg
27
Sharding the Database – Lessons Learned
+ At the onset of developmentdetermine whether to shard
+ Specify shard key in queries Allows mongos to route query Minimize distributed “scatter/gather” queries Queries spanning chunks likely span shards
+ Choose a key that allows even balancing Balancing is performed in 32 MB chunks Design shard key to ensure chunks will not
exceed 32 MB
Found at: http://www.airbrushaction.com/content/sites/default/files/tipstricks-images/4_27.png
28
High Availability and Disaster Recovery – Replica Sets
+ If audit log is down, then 100,000 health care providers are idle
+ Audit logging subsystem must be more reliable than customer EHR
+ Node failover must be automatic+ Protect against network and data
center failure scenarios
Found at: http://www.huntsmart.com/App_Themes/hs.com/ProductImages/250/DNSBC.jpg
29
Disaster Recovery DCPrimary DC DC2 AZ2
Sharded Cluster Replicated Across Multiple Data Centers
config
mongos shard 2arbiter
mongos
amq
arbiter
amq
DC3 AZ1
shard 2
DC2 AZ1
shard 2
mongos shard 3arbiter
mongos
arbitershard 3shard 3
mongos shard 1arbiter
mongos
arbitershard 1shard 1
config config
amq amq
30
Performance and Stress Testing – Lessons Learned
+ Acquire or build load testing tools+ Test using a realistic, unbiased data set+ Test database cluster to ensure write
throughput+ Ensure read & write performance meets
load requirements + Find the performance ceiling+ Find and resolve bottlenecks+ Tune IO and memory
Found at: http://www.webdesign.org/img_articles/21892/broken_chain.jpg
31
Data Migration – Lessons Learned
Data Migration+ Parallelize data migration process+ Identify and remove bottlenecks+ Scale out MongoDB cluster to handle
heavy write load+ Determine whether best to add
indexes before or after migration+ It takes a while to extract, transform,
and load billions of documentsFound at: http://www.dennissy.com/wp-content/uploads/2010/07/house_moving_malaysia.jpg
32
Data Repair – Lessons Learned
Bulk update on collections
+ Use Bulk() operation builder bulk.find.update() Simple, unordered parallelized > 200,000 updates/minute
+ Regular update operation ~ 2,000 updates/minute
33
Choosing the Appropriate Data Store
MongoDB over relational?+ Scale out for transaction volume
and data volume+ Developer productivity
Easy map between application and data store
+ Highly varying document structure
+ Offload read activity in optimized format different from data writes(a.k.a. CQRS pattern)
Found at: http://www.meonuk.com/hammers-mauls
34
Choosing the Appropriate Data Store
Relational over MongoDB?+ Complex normalized data model+ Diverse read patterns requiring
joins+ Ad hoc reporting and analysis+ Data integrity difficult to manage
in application layerFound at:
http://3.bp.blogspot.com/_QUmmdgc7l6A/TTPUyRWFNPI/AAAAAAAAAO8/KV_i2c2lrRk/s1600/saws+various.jpg
35
MongoDB @ Practice Fusion
Upcoming MongoDB projects+ Observations data store
Scale-out data store forpatient vital signs, etc.
+ Clinical data repositoryRead cache for patient medical records (CQRS pattern)
+ Upgrades for Audit 2.0WiredTiger + compression
Found at: http://jbirdmedia.org/vessels/images/uploads/framing-new-const-lg.jpg