Top Banner
Finding Love With MongoDB { name : "Oliver Dodd", email : "[email protected]", twitter : "01001111" }
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Finding Love with MongoDB

Finding Love With MongoDB

{ name : "Oliver Dodd", email : "[email protected]", twitter : "01001111" }

Page 2: Finding Love with MongoDB

Traditional Search

Unidirectional User Defined Criteria

Page 3: Finding Love with MongoDB

eHarmony Matching

Bidirectional User Defined Criteria

Page 4: Finding Love with MongoDB

Matching Overview

Potential Match Finder Machine Learned Matching Match Delivery

Photo  Credits  Magnifying  glass:  andercismo  @  h7p://www.flickr.com/photos/andercismo/  Machine  learning:  University  of  Maryland  Press  Releases  @  h7p://www.flickr.com/photos/umdnews/  Mailman:  h7p://www.flickr.com/photos/noizephotography/  

Page 5: Finding Love with MongoDB

Potential Match Generator

•  Find candidates that meet user’s preferences.

•  Ensure user doesn’t violate each candidate’s preferences.

•  Discard pairings that violate Compatibility Models.

•  Do this as fast as possible.

Page 6: Finding Love with MongoDB

Legacy “Potential Match Generator”

Page 7: Finding Love with MongoDB

Redesign

Requirements for a new data store

–  Centralized –  Scalable –  Automagical –  Easy to maintain –  Fast, multi-attribute searches

Page 8: Finding Love with MongoDB

New ”Potential Match Generator”

Page 9: Finding Love with MongoDB

Why MongoDB?

•  Scalability

•  Built in sharding and replication

•  Autobalancing

•  Rich, complex queries

Page 10: Finding Love with MongoDB

Why MongoDB?

MongoDB is web scale.

Page 11: Finding Love with MongoDB

Wins

•  Deploy new instances on demand. –  No need to load a local database.

•  Adding replicas is easy and fast. •  Fast queries when isolated to a shard.

•  Flexible schema –  No more reloading for minor data model changes.

•  Built-in iterative fetching.

Page 12: Finding Love with MongoDB

Losses

•  No schema = larger footprint.

•  Traditional DBAs can’t help (without training).

•  Aggregation queries are drastically different.

•  Initial configuration can be a long, manual process.

Page 13: Finding Love with MongoDB

Protips

Page 14: Finding Love with MongoDB

Use Real Queries

photo by Official U.S. Navy Imagery on Flickr

Turn on the fire hose When testing or even evaluating, use production data and queries.  

Page 15: Finding Love with MongoDB

Use Real Queries

Unleash the Chaos Monkey Kill your own mongod instances to ensure your cluster and applications continue to function normally.  

photo by dboy @ http://www.flickr.com/photos/dannyboyster/

Page 16: Finding Love with MongoDB

Minimize

Minify property names. –  In Java, use Morphia for mapping or Salat in Scala

(also good for queries but we developed our own generic Query API)

–  Use one or two characters per property name.

Consider retrieving full objects from another collection or data store, storing only what you absolutely need for your queries in the search store.

–  On a related note, cache full objects; cache query results only if your queried attributes are small in number.

Page 17: Finding Love with MongoDB

Indexes

When performing large, variable, multi-attribute searches, have a decent number of them. Cover the major types of queries and the worst performing outliers.

–  What is present in every query?

–  What are the best performing attributes when present?

–  What should my index look like when no high performing attributes appear in the query?

Page 18: Finding Love with MongoDB

Indexes

Omit ranges unless they are absolutely critical; if needed, put them at the end.

–  Can I replace this with an $in clause?

–  Can this be prioritized in its own index?

–  Should there be versions of this index with and without this particular attribute?

–  Will the appearance of this attribute in the index give me any speed advantage over inspecting the full object?

Page 19: Finding Love with MongoDB

Indexes

Ordering is very, very important.

–  Attributes for which a user can only have a single value should appear towards the top of the index.

–  Attributes that depend on the values of another attribute should appear in immediate succession.

–  Again, put ranges at the bottom. If multiple ranges are necessary, ensure that they appear in order of their ability to reduce the working set.

The order of fields in an index should be: First, fields on which you will query for exact values. Second, fields on which you will sort. Finally, fields on which you will query for a range of values.

Eric@MongoLab - http://blog.mongolab.com/2012/06/cardinal-ins/  

Page 20: Finding Love with MongoDB

Indexes

Analyze slow queries to find out what attributes you can capitalize on.

When building a compound index, don’t include fields that only appear in $or queries as part of multi-attribute queries. db.toasters.find({ slots: 4, canBagel: true, $or: [ { material: "stainless-steel"}, { price: {$lte: 50}}, ] })

Page 21: Finding Love with MongoDB

Queries – Ranges

Translate "between" queries to in clauses when dealing with discrete values.

$and: [ {a: { $gte: 0}},

{a: { $lte: 5}}

]

becomes

a: { $in: [0,1,2,3,4,5]}

Page 22: Finding Love with MongoDB

Attributes - Decrease Granularity

birthdate => birthyear floats => ints

number _of_items => has_items?

Page 23: Finding Love with MongoDB

Sharding

•  Try to isolate queries to a particular shard.

•  Ensure that your data and indexes can fit entirely in memory.

•  If certain attributes ALWAYS appear in the query and, in combination, give you a large number of well distributed data partitions, consider making them the shard key.

Page 24: Finding Love with MongoDB

We’re Hiring

h7p://www.eharmony.com/about/careers