Jeff Lemmerman Matt Chimento MongoDB Medtronic Confidenti 1 9th Annual CodeFreeze Symposium Medtronic Energy and Component Center
Feb 24, 2016
Developing in Handcuffs
Jeff LemmermanMatt Chimento
MongoDBMedtronic Confidential19th Annual CodeFreeze Symposium
Medtronic Energy and Component Center12Medtronic Energy and Component Center
MECC est. 1976 MECC ComponentsBatteriesDefibrillation CapacitorsFeedthroughsGlass/ Metal FeedthroughsPrecision Molding and Extrusion
Census 1200 Employees Plant Size 190,000 Square Feet40,000 Manufacturing15,000 R&D Labs38,000 Office97,000 Common, Support, Warehouse
2About MongoDBBackgroundFounded in 2007 as 10GenFirst release of MongoDB in 2009$223M+ in fundingMongoDBCore serverNative driversVersion 2.4.9 released 1/10/14Subscriptions, Consulting, TrainingMonitoring (MMS)
RDBMS StrengthsData stored is very compactRigid schemas have led to powerful query capabilitiesData is optimized for joins and storageRobust ecosystem of tools, libraries, integrations40+ years old!
Enter Big DataGartner defines it with 3VsVolumeVast amounts of data being collectedVarietyEvolving dataUncontrolled formats, no single schemaUnknown at design timeVelocityInbound data speedFast read/write operationsLow latencyIs this a BIG data problem?6
When the Sloan Digital Sky Survey (SDSS) began collecting astronomical data in 2000, it amassed more in its first few weeks than all data collected in the history of astronomy. Continuing at a rate of about 200 GB per night, SDSS has amassed more than 140 terabytes of information. When the Large Synoptic Survey Telescope, successor to SDSS, comes online in 2016 it is anticipated to acquire that amount of data every five days.
The Large Hadron Collider experiments represent about 150 million sensors delivering data 40 million times per second. Working with less than 0.001% of the sensor stream data, the data flow from all four LHC experiments represents 25 petabytes annual rate before replication (as of 2012).
How data is being stored, if data is replicated, how long it needs to be retained all factor in.
Is this BIG Data? Depends on tools readily available. 1 GB in Excel?
General explorationanswer specific questionsmake data driven decisionsgenerate predictive models
6Where stored?7
7Mapping Big Data to RDBMSDifficult to store uncontrolled data formatsScaling via big iron or custom data marts/partitioning schemesSchema must be known at design timeImpedance mismatch with agile development and deployment techniquesDoesnt map well to native language constructsKey FeaturesData stored as documents (JSON-like BSON)Flexible-schemaIn schema design, think about optimizing for read vs. storageFull CRUD support (Create, Read, Update, Delete)Atomic in-place updatesAd-hoc queries: Equality, RegEx, Ranges, GeospatialSecondary indexesReplication redundancy, failoverSharding partitioning for read/write scalabilityTerminologyCollection = TableIndex = IndexDocument = RowColumn = FieldJoining = Embedding & LinkingOur experience with MongoDBConsulting/Training has been excellentSupport agreement has been under-utilizedEmails for security updates etc. are promptRelease cycle is frequentMongo Monitoring Service Potential concerns storing db stats externallyMongoDB Certification now availableNew course coming soon in Udacity
Building First C# ApplicationCRUD operations for domain class ComponentCreate new Visual Studio 2010/2012 projectInstall C# driver currently 1.8.3
Domain class annotationsAuthenticationReplicationSharding
11
Medtronic Confidential12
13
How is data retrieved?
1414Loading Data Into Central Repository15
Download/Install MongoDB16
mongodb.org/downloadsInstall MongoDB as Windows Service17
Create Default Data Directory18C:\data\dbStart MongodC:\MongoDB\bin\mongod.exeMongoDB Shell19C:\MongoDB\bin\mongo.exe
Creating Components20
.insert() will always try to create new document.save() if _id already exists will update If document doesnt have _id field it is added
Reading Components21
Reading Components22
Returns Null
.explain() will give stats on command22Updating Components23
$set keyword used for partial updatesWithout $set keyword entire document is replaced{multi : true} to update multiple documents
Deleting Components24
Works like .find()
Drops collectionDrops databaseMedtronic Confidential25
Creating Components - CompRepo26
mongodb://localhost/databaseCreating Components Add()27
Reading Components28
Do not convert query results to List() here or from calling method, just iterate through enumerable28Updating Components Save()29
Save sends entire document back to serverUpdating Components Update()30Update only sends changes
Deleting Components31
Needed to add reference to Repo classAutomapping32
33
MongoDB domain class options33Authentication34Clients on localhost connect as admin by defaultStart mongod with config option to disableCreate read-only user and a write userStart mongod with these config options
34
Replica SetsKey Concepts:Data is duplicated over several nodes, ideally spread among separate data centersAutomatic FailoverElection of new primary via strict majority35
Scaling ReadsReplica sets can be used as a load balancer by only reading from secondary nodesOur example, write heavy and read infrequently.36
ShardingSharding enables Horizontal scaling across multiple nodes. Sharding breaks up chunks of data based on a user defined shard keyAdd arrows for or mention that there is communication between shards (migrations)37Key PointsCHOOSE WISELY: SHARD KEY CANNOT BE CHANGED!All documents in sharded collection must include the shard keyShard key must be an indexed fieldQueries that sort by the shard key are much more efficientMongos handles routing to the correct shardShardingWhat makes a good shard key?
Sharding
Key LearningsWorking Set < MemoryISODate("2012-09-25T03:00:23Z") Use UTCQueries must match data type string vs. integerDownload and use other MongoDB tools (MongoVUE)Do not convert query results to List
40GapsEnterprise acceptance of new approachIntegration with off-the-shelf reporting and analyticsUser interface for managing the database clusterDeveloper familiarity with JSON and MongoDB21 CFR Part 11 Compliance
Thank YouQuestions?Medtronic Confidential42docs.mongodb.org/manualCollect and store raw data
Medtronic Confidential4343Databases Are Not ARDSMedtronic Confidential44RDBMS Optimized For Storage
Waveform DataMedtronic Confidential45
ObjectIdSpecial 12-byte BSON type that guarantees uniqueness within the collection. The ObjectID is generated based on timestamp, machine ID, process ID, and a process-local incremental counter. MongoDB uses ObjectId values as the default values for _id fields.46Indexing47Without indexes queries must perform a table scan (every document)
All collections index on the _id field
Backup/Restore48One option is to use mongodump.exe / mongorestore.exeAggregation Framework49
Write Concern50