Top Banner
RAIDING THE MONGODB TOOLBOX Jeremy Mikola jmikola
54

Raiding the MongoDB Toolbox with Jeremy Mikola

Aug 10, 2015

Download

Technology

MongoDB
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Raiding the MongoDB Toolbox with Jeremy Mikola

RAIDING THEMONGODBTOOLBOX

Jeremy Mikolajmikola

Page 2: Raiding the MongoDB Toolbox with Jeremy Mikola

Agenda

Full-text IndexingGeospatial QueriesData AggregationCreating a Job QueueTailable Cursors

Page 3: Raiding the MongoDB Toolbox with Jeremy Mikola

Full-textIndexing

Page 4: Raiding the MongoDB Toolbox with Jeremy Mikola

You have an awesome PHP blog

{ "_id": ObjectId("544fd63860dab3b12521379b"), "title": "Ten Secrets About PSR-7 You Won't Believe!", "content": "Phil Sturgeon caused quite a stir on the PHP-FIG mailing list this morning when he unanimously passed Matthew Weier O'Phinney's controversial PSR-7 specification. PHP-FIG members were outraged as the self-proclaimed Gordon Ramsay of PHP…", "published": true, "created_at": ISODate("2014-10-28T17:46:36.065Z")}

Page 5: Raiding the MongoDB Toolbox with Jeremy Mikola

We’d like to search the content

Store arrays of keyword stringsQuery with regular expressionsSync data to Solr, Elasticsearch, etc.Create a full-text index

Page 6: Raiding the MongoDB Toolbox with Jeremy Mikola

Creating a full-text index

$collection->createIndex( ['content' => 'text']);

Compound indexing with other fields$collection->createIndex( ['content' => 'text', 'created_at' => 1]);

Indexing multiple string fields$collection->createIndex( ['content' => 'text', 'title' => 'text']);

Page 7: Raiding the MongoDB Toolbox with Jeremy Mikola

Step 1: Tokenization

[ Phil, Sturgeon, caused, quite, a, stir, on, the, PHP-FIG, mailing, list, this, morning, when, he, unanimously, passed, …]

Page 8: Raiding the MongoDB Toolbox with Jeremy Mikola

Step 2: Trim stop-words

[ Phil, Sturgeon, caused, quite, stir, PHP-FIG, mailing, list, morning, unanimously, passed, …]

Page 9: Raiding the MongoDB Toolbox with Jeremy Mikola

Step 3: Stemming

[ Phil, Sturgeon, cause, quite, stir, PHP-FIG, mail, list, morning, unanimous, pass, …]

Page 10: Raiding the MongoDB Toolbox with Jeremy Mikola

Step 4: Profit?

Page 11: Raiding the MongoDB Toolbox with Jeremy Mikola

Querying a text index

$cursor = $collection->find( ['$text' => ['$search' => 'Phil Sturgeon']]);

foreach ($cursor as $document) { echo $document['content'] . "\n\n";}

↓Phil Sturgeon caused quite a stir on the PHP-FIG…

Phil Jerkson, better known as @phpjerk on Twitter…

Page 12: Raiding the MongoDB Toolbox with Jeremy Mikola

and Phrases negations

$cursor = $collection->find( ['$text' => ['$search' => 'PHP -"Phil Sturgeon"']]);

foreach ($cursor as $document) { echo $document['content'] . "\n\n";}

↓Be prepared for the latest and greatest version of PHP with…

Page 13: Raiding the MongoDB Toolbox with Jeremy Mikola

Sorting by the match score

$cursor = $collection->find( ['$text' => ['$search' => 'Phil Sturgeon']], ['score' => ['$meta' => 'textScore']]);

$cursor->sort(['score' => ['$meta' => 'textScore']]);

foreach ($cursor as $document) { printf("%.6f: %s\n\n", $document['score'], $document['content']);}

↓1.035714: Phil Sturgeon caused quite a stir on the PHP-FIG…

0.555556: Phil Jerkson, better known as @phpjerk on Twitter…

Page 14: Raiding the MongoDB Toolbox with Jeremy Mikola

Supporting multiple languages

$collection->createIndex( ['content' => 'text'], ['default_language' => 'en']);

$collection->insert([ 'content' => 'We are planning a hot dog conference',]);

$collection->insert([ 'content' => 'Die Konferenz wird WurstCon benannt werden', 'language' => 'de',]);

$collection->find( ['$text' => ['$search' => 'saucisse', '$language' => 'fr']],);

Page 15: Raiding the MongoDB Toolbox with Jeremy Mikola

Geospatial Queries

Page 16: Raiding the MongoDB Toolbox with Jeremy Mikola

Because some of us ❤ maps

Page 18: Raiding the MongoDB Toolbox with Jeremy Mikola

in a nutshellGeoJSON

{ "type": "Point", "coordinates": [100.0, 0.0] }

{ "type": "LineString", "coordinates": [[100.0, 0.0], [101.0, 1.0]] }

{ "type": "Polygon", "coordinates": [ [[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]], [[100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2]] ]}

{ "type": "MultiPolygon", "coordinates": [ [[[102, 2], [103, 2], [103, 3], [102, 3], [102, 2]]], [[[100, 0], [101, 0], [101, 1], [100, 1], [100, 0.0]]] ]}

{ "type": "GeometryCollection", "geometries": [ { … }, { … } ]}

Page 19: Raiding the MongoDB Toolbox with Jeremy Mikola

ARRAYS

ARRAYS EVERYWHERE

Page 21: Raiding the MongoDB Toolbox with Jeremy Mikola

Indexing some places of interest

$collection->insert([ 'name' => 'Hyatt Regency Santa Clara', 'type' => 'hotel', 'loc' => [ 'type' => 'Point', 'coordinates' => [-121.976557, 37.404977], ],]);

$collection->insert([ 'name' => 'In-N-Out Burger', 'type' => 'restaurant', 'loc' => [ 'type' => 'Point', 'coordinates' => [-121.982102, 37.387993], ]]);

$collection->ensureIndex(['loc' => '2dsphere']);

Page 22: Raiding the MongoDB Toolbox with Jeremy Mikola

Inclusion queries

// Define a GeoJSON polgyon$polygon = [ 'type' => 'Polygon', 'coordinates' => [ [ [-121.976557, 37.404977], // Hyatt Regency [-121.982102, 37.387993], // In-N-Out Burger [-121.992311, 37.404385], // Rabbit's Foot Meadery [-121.976557, 37.404977], ], ],];

// Find documents within the polygon's bounds$collection->find(['loc' => ['$geoWithin' => $polygon]]);

// Find documents within circular bounds$collection->find(['loc' => ['$geoWithin' => ['$centerSphere' => [ [-121.976557, 37.404977], // Center coordinate 5 / 3959, // Convert miles to radians]]]]);

Page 23: Raiding the MongoDB Toolbox with Jeremy Mikola

Sorted proximity queries$point = [ 'type' => 'Point', 'coordinates' => [-121.976557, 37.404977]];

// Find locations nearest a point$collection->find(['loc' => ['$near' => $point]]);

// Find the nearest 50 restaurants within 5km$collection->find([ 'loc' => ['$near' => $point, '$maxDistance' => 5000], 'type' => 'restuarant',])->limit(50);

Page 24: Raiding the MongoDB Toolbox with Jeremy Mikola
Page 25: Raiding the MongoDB Toolbox with Jeremy Mikola

Data Aggregation

Page 27: Raiding the MongoDB Toolbox with Jeremy Mikola

Count

$collection->insert(['code' => 'A123', 'num' => 500 ]);$collection->insert(['code' => 'A123', 'num' => 250 ]);$collection->insert(['code' => 'B212', 'num' => 200 ]);$collection->insert(['code' => 'A123', 'num' => 300 ]);

$collection->count(); // Returns 4

$collection->count(['num' => ['$gte' => 250]]); // Returns 3

Page 28: Raiding the MongoDB Toolbox with Jeremy Mikola

Distinct

$collection->insert(['code' => 'A123', 'num' => 500 ]);$collection->insert(['code' => 'A123', 'num' => 250 ]);$collection->insert(['code' => 'B212', 'num' => 200 ]);$collection->insert(['code' => 'A123', 'num' => 300 ]);

$collection->distinct('code'); // Returns ["A123", "B212"]

Page 29: Raiding the MongoDB Toolbox with Jeremy Mikola

Group

$collection->insert(['code' => 'A123', 'num' => 500 ]);$collection->insert(['code' => 'A123', 'num' => 250 ]);$collection->insert(['code' => 'B212', 'num' => 200 ]);$collection->insert(['code' => 'A123', 'num' => 300 ]);

$result = $collection->group( ['code' => 1], // field(s) on which to group ['sum' => 0], // initial aggregate value new MongoCode('function(cur, agg) { agg.sum += cur.num }'));

foreach ($result['retval'] as $grouped) { printf("%s: %d\n", $grouped['code'], $grouped['sum']);}

↓A123: 1050B212: 200

Page 30: Raiding the MongoDB Toolbox with Jeremy Mikola

MapReduce

Extremely versatile, powerfulIntended for complex data analysisOverkill for simple aggregation tasks

e.g. averages, summation, groupingIncremental data processing

Page 31: Raiding the MongoDB Toolbox with Jeremy Mikola

Aggregating query profiler output

{ "op" : "query", "ns" : "db.collection", "query" : { "code" : "A123", "num" : { "$gt" : 225 } }, "ntoreturn" : 0, "ntoskip" : 0, "nscanned" : 11426, "lockStats" : { … }, "nreturned" : 0, "responseLength" : 20, "millis" : 12, "ts" : ISODate("2013-05-23T21:24:39.327Z"),}

Page 32: Raiding the MongoDB Toolbox with Jeremy Mikola

Constructing a query skeleton

{ "code" : "A123", "num" : { "$gt" : 225 }}

↓{ "code" : <string>, "num" : { "$gt" : <number> }}

Aggregate stats for similar queries(e.g. execution time, index performance)

Page 34: Raiding the MongoDB Toolbox with Jeremy Mikola

Aggregation framework

Process a stream of documentsOriginal input is a collectionOutputs one or more result documents

Series of operatorsFilter or transform dataInput/output chain

ps ax | grep mongod | head -n 1

Page 35: Raiding the MongoDB Toolbox with Jeremy Mikola

Executing an aggregation pipeline

$collection->aggregateCursor([ ['$match' => ['status' => 'A']], ['$group' => ['_id' => '$cust_id', 'total' => ['$sum' => '$amount']]]]);

Page 36: Raiding the MongoDB Toolbox with Jeremy Mikola

Pipeline operators

$match$geoNear$project$group$unwind

$sort$limit$skip$redact$out

Page 38: Raiding the MongoDB Toolbox with Jeremy Mikola

Solving symbolic equations and calculus

Page 39: Raiding the MongoDB Toolbox with Jeremy Mikola

Creating a Job Queue

Page 40: Raiding the MongoDB Toolbox with Jeremy Mikola

Things not to do in your controllers

Send email messagesUpload files to S3Blocking API callsHeavy data processingMining cryptocurrency

Page 41: Raiding the MongoDB Toolbox with Jeremy Mikola

Creating a job

$collection->insert([ 'data' => [ … ], 'processed' => false, 'createdAt' => new MongoDate,]);

$collection->createIndex( ['processed' => 1, 'createdAt' => 1]);

Page 42: Raiding the MongoDB Toolbox with Jeremy Mikola

Selecting a job

$job = $collection->findAndModify( ['processed' => false], ['$set' => ['processed' => true, 'receivedAt' => new MongoDate]], null, // field projection (if any) [ 'sort' => ['createdAt' => 1], 'new' => true, ]);

↓{ "_id" : ObjectId("54515e16ba5a4da1b15a1766"), "data" : { … }, "processed" : true, "createdAt" : ISODate("2014-10-29T21:37:26.405Z"), "receivedAt" : ISODate("2014-10-29T21:37:33.118Z")}

Page 43: Raiding the MongoDB Toolbox with Jeremy Mikola

Schedule jobs in the future

$collection->insert([ 'data' => [ … ], 'processed' => false, 'createdAt' => new MongoDate, 'scheduledAt' => new MongoDate(strtotime('1 hour')),]);

↓$now = new MongoDate;

$job = $collection->findAndModify( ['processed' => false, 'scheduledAt' => ['$lt' => $now]], ['$set' => ['processed' => true, 'receivedAt' => $now]], null, [ 'sort' => ['createdAt' => 1], 'new' => true, ]);

Page 44: Raiding the MongoDB Toolbox with Jeremy Mikola

Prioritize job selection

$collection->insert([ 'data' => [ … ], 'processed' => false, 'createdAt' => new MongoDate, 'priority' => 0,]);

// Index: { "processed": 1, "priority": -1, "createdAt": 1 }

↓$now = new MongoDate;

$job = $collection->findAndModify( ['processed' => false], ['$set' => ['processed' => true, 'receivedAt' => $now]], null, [ 'sort' => ['priority' => -1, 'createdAt' => 1], 'new' => true, ]);

Page 45: Raiding the MongoDB Toolbox with Jeremy Mikola

Gracefully handle failed jobs

$collection->insert([ 'data' => [ … ], 'processed' => false, 'createdAt' => new MongoDate, 'attempts' => 0,]);

↓$now = new MongoDate;

$job = $collection->findAndModify( ['processed' => false], [ '$set' => ['processed' => true, 'receivedAt' => $now], '$inc' => ['attempts' => 1], ], null, [ 'sort' => ['createdAt' => 1], 'new' => true, ]);

Page 46: Raiding the MongoDB Toolbox with Jeremy Mikola

Tailable Cursors

Page 47: Raiding the MongoDB Toolbox with Jeremy Mikola

Capped collections

$database->createCollection( 'tailme', [ 'capped' => true, 'size' => 16777216, // 16 MiB 'max' => 1000, ]);

Page 48: Raiding the MongoDB Toolbox with Jeremy Mikola

Producer

for ($i = 0; ++$i; ) { $collection->insert(['x' => $i]); printf("Inserted: %d\n", $i); sleep(1);}

↓Inserted: 1Inserted: 2Inserted: 3Inserted: 4Inserted: 5…

Page 49: Raiding the MongoDB Toolbox with Jeremy Mikola

Consumer

$cursor = $collection->find();$cursor->tailable(true);$cursor->awaitData(true);

while (true) { if ($cursor->dead()) { break; }

if ( ! $cursor->hasNext()) { continue; }

printf("Consumed: %d\n", $cursor->getNext()['x']);}

↓Consumed: 1Consumed: 2…

Page 50: Raiding the MongoDB Toolbox with Jeremy Mikola

Replica set oplog

$collection->insert([ 'x' => 1,]);

↓{ "ts" : Timestamp(1414624929, 1), "h" : NumberLong("2631382894387434484"), "v" : 2, "op" : "i", "ns" : "test.foo", "o" : { "_id" : ObjectId("545176a14ab5c0c999da70f0"), "x" : 1 }}

Page 51: Raiding the MongoDB Toolbox with Jeremy Mikola

Replica set oplog

$collection->update( ['x' => 1], ['$inc' => ['x' => 1]]);

↓{ "ts" : Timestamp(1414624962, 1), "h" : NumberLong("5079425106850550701"), "v" : 2, "op" : "u", "ns" : "test.foo", "o2" : { "_id" : ObjectId("545176a14ab5c0c999da70f0") }, "o" : { "$set" : { "x" : 2 } }}

Page 53: Raiding the MongoDB Toolbox with Jeremy Mikola

THANKS!

Questions?

Page 54: Raiding the MongoDB Toolbox with Jeremy Mikola

Image CreditsBooks designed by from the Aggregator designed by from the Register designed by from the Ouroboros designed by from the

Catherine Please Noun Projectstuart mcmorris Noun Project

Wilson Joseph Noun ProjectSilas Reeves Noun Project

http://mariompittore.com/wp-content/uploads/2013/08/Social-Gnomes1.png