Top Banner
Jeff Lemmerman Matt Chimento MongoDB Medtronic Confidenti 1 9th Annual CodeFreeze Symposium Medtronic Energy and Component Center
50

MongoDB

Feb 24, 2016

Download

Documents

HAL

MongoDB. 9th Annual CodeFreeze Symposium . Jeff Lemmerman Matt Chimento. Medtronic Energy and Component Center. Medtronic Energy and Component Center. MECC est. 1976 MECC Components Batteries Defibrillation Capacitors Feedthroughs Glass/ Metal Feedthroughs - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Developing in Handcuffs

Jeff LemmermanMatt Chimento

MongoDBMedtronic Confidential19th Annual CodeFreeze Symposium

Medtronic Energy and Component Center12Medtronic Energy and Component Center

MECC est. 1976 MECC ComponentsBatteriesDefibrillation CapacitorsFeedthroughsGlass/ Metal FeedthroughsPrecision Molding and Extrusion

Census 1200 Employees Plant Size 190,000 Square Feet40,000 Manufacturing15,000 R&D Labs38,000 Office97,000 Common, Support, Warehouse

2About MongoDBBackgroundFounded in 2007 as 10GenFirst release of MongoDB in 2009$223M+ in fundingMongoDBCore serverNative driversVersion 2.4.9 released 1/10/14Subscriptions, Consulting, TrainingMonitoring (MMS)

RDBMS StrengthsData stored is very compactRigid schemas have led to powerful query capabilitiesData is optimized for joins and storageRobust ecosystem of tools, libraries, integrations40+ years old!

Enter Big DataGartner defines it with 3VsVolumeVast amounts of data being collectedVarietyEvolving dataUncontrolled formats, no single schemaUnknown at design timeVelocityInbound data speedFast read/write operationsLow latencyIs this a BIG data problem?6

When the Sloan Digital Sky Survey (SDSS) began collecting astronomical data in 2000, it amassed more in its first few weeks than all data collected in the history of astronomy. Continuing at a rate of about 200 GB per night, SDSS has amassed more than 140 terabytes of information. When the Large Synoptic Survey Telescope, successor to SDSS, comes online in 2016 it is anticipated to acquire that amount of data every five days.

The Large Hadron Collider experiments represent about 150 million sensors delivering data 40 million times per second. Working with less than 0.001% of the sensor stream data, the data flow from all four LHC experiments represents 25 petabytes annual rate before replication (as of 2012).

How data is being stored, if data is replicated, how long it needs to be retained all factor in.

Is this BIG Data? Depends on tools readily available. 1 GB in Excel?

General explorationanswer specific questionsmake data driven decisionsgenerate predictive models

6Where stored?7

7Mapping Big Data to RDBMSDifficult to store uncontrolled data formatsScaling via big iron or custom data marts/partitioning schemesSchema must be known at design timeImpedance mismatch with agile development and deployment techniquesDoesnt map well to native language constructsKey FeaturesData stored as documents (JSON-like BSON)Flexible-schemaIn schema design, think about optimizing for read vs. storageFull CRUD support (Create, Read, Update, Delete)Atomic in-place updatesAd-hoc queries: Equality, RegEx, Ranges, GeospatialSecondary indexesReplication redundancy, failoverSharding partitioning for read/write scalabilityTerminologyCollection = TableIndex = IndexDocument = RowColumn = FieldJoining = Embedding & LinkingOur experience with MongoDBConsulting/Training has been excellentSupport agreement has been under-utilizedEmails for security updates etc. are promptRelease cycle is frequentMongo Monitoring Service Potential concerns storing db stats externallyMongoDB Certification now availableNew course coming soon in Udacity

Building First C# ApplicationCRUD operations for domain class ComponentCreate new Visual Studio 2010/2012 projectInstall C# driver currently 1.8.3

Domain class annotationsAuthenticationReplicationSharding

11

Medtronic Confidential12

13

How is data retrieved?

1414Loading Data Into Central Repository15

Download/Install MongoDB16

mongodb.org/downloadsInstall MongoDB as Windows Service17

Create Default Data Directory18C:\data\dbStart MongodC:\MongoDB\bin\mongod.exeMongoDB Shell19C:\MongoDB\bin\mongo.exe

Creating Components20

.insert() will always try to create new document.save() if _id already exists will update If document doesnt have _id field it is added

Reading Components21

Reading Components22

Returns Null

.explain() will give stats on command22Updating Components23

$set keyword used for partial updatesWithout $set keyword entire document is replaced{multi : true} to update multiple documents

Deleting Components24

Works like .find()

Drops collectionDrops databaseMedtronic Confidential25

Creating Components - CompRepo26

mongodb://localhost/databaseCreating Components Add()27

Reading Components28

Do not convert query results to List() here or from calling method, just iterate through enumerable28Updating Components Save()29

Save sends entire document back to serverUpdating Components Update()30Update only sends changes

Deleting Components31

Needed to add reference to Repo classAutomapping32

33

MongoDB domain class options33Authentication34Clients on localhost connect as admin by defaultStart mongod with config option to disableCreate read-only user and a write userStart mongod with these config options

34

Replica SetsKey Concepts:Data is duplicated over several nodes, ideally spread among separate data centersAutomatic FailoverElection of new primary via strict majority35

Scaling ReadsReplica sets can be used as a load balancer by only reading from secondary nodesOur example, write heavy and read infrequently.36

ShardingSharding enables Horizontal scaling across multiple nodes. Sharding breaks up chunks of data based on a user defined shard keyAdd arrows for or mention that there is communication between shards (migrations)37Key PointsCHOOSE WISELY: SHARD KEY CANNOT BE CHANGED!All documents in sharded collection must include the shard keyShard key must be an indexed fieldQueries that sort by the shard key are much more efficientMongos handles routing to the correct shardShardingWhat makes a good shard key?

Sharding

Key LearningsWorking Set < MemoryISODate("2012-09-25T03:00:23Z") Use UTCQueries must match data type string vs. integerDownload and use other MongoDB tools (MongoVUE)Do not convert query results to List

40GapsEnterprise acceptance of new approachIntegration with off-the-shelf reporting and analyticsUser interface for managing the database clusterDeveloper familiarity with JSON and MongoDB21 CFR Part 11 Compliance

Thank YouQuestions?Medtronic Confidential42docs.mongodb.org/manualCollect and store raw data

Medtronic Confidential4343Databases Are Not ARDSMedtronic Confidential44RDBMS Optimized For Storage

Waveform DataMedtronic Confidential45

ObjectIdSpecial 12-byte BSON type that guarantees uniqueness within the collection. The ObjectID is generated based on timestamp, machine ID, process ID, and a process-local incremental counter. MongoDB uses ObjectId values as the default values for _id fields.46Indexing47Without indexes queries must perform a table scan (every document)

All collections index on the _id field

Backup/Restore48One option is to use mongodump.exe / mongorestore.exeAggregation Framework49

Write Concern50