Top Banner
NoSql NOW! 2013 Delivering big content at NBC News with RavenDB
34

Delivering big content at NBC News with RavenDB

Jan 27, 2015

Download

Technology

John Bennett

RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of languages. In this session, we will discuss the experience of developing and maintaining a RavenDB-backed CMS for one of the largest news sites in the US.

We'll cover:
- Supporting rapid evolution of the content/data model.
- Indexing for full-text, map-reduce, geospatial and other types of search.
- Replicating and sharding across servers and data centers for high-availability.
- Deploying with no downtime.
- Handling huge traffic spikes.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Delivering big content at NBC News with RavenDB

NoSql NOW! 2013

Delivering big content at NBC News with RavenDB

Page 2: Delivering big content at NBC News with RavenDB
Page 3: Delivering big content at NBC News with RavenDB

A quick tour

Page 4: Delivering big content at NBC News with RavenDB

•  Schema-less document database with RESTful API. •  Fully ACID and all writes saved to disk (ESENT). •  Indexing/queries executed with Lucene.NET.

•  Easily extended with custom logic using “bundles”.

•  Management UI provided in Silverlight.

•  Host as Windows Service, IIS app, or embedded in your app.

Raven server

Page 5: Delivering big content at NBC News with RavenDB

•  .NET client provided. Third-party clients exist for JavaScript, PHP, and Ruby.

•  Wraps HTTP API.

•  Provides client-side caching, change notification, LINQ querying.

•  Easily extended with many, many hooks into almost all operations.

Raven client

Page 6: Delivering big content at NBC News with RavenDB

•  Open source: http://github.com/ravendb/ravendb

•  License is AGPL (free) or commercial (paid).

•  Exception: Your project can use any OSI-approved license and still use Raven for free.

•  Commercial licenses based on max parallelism and RAM.

•  Windows clustering support and storage compression/encryption available with Enterprise license only.

Raven licensing

Page 7: Delivering big content at NBC News with RavenDB

Demo

Page 8: Delivering big content at NBC News with RavenDB

Why RavenDB?

Page 9: Delivering big content at NBC News with RavenDB

•  Includes nbcnews.com, today.com and more.

•  1.2 billion pageviews/month.

•  140 million video streams/month.

•  58 million unique users/month.

•  Traffic spikes up to 100x normal when big news events happen.

NBC News Digital network

Page 10: Delivering big content at NBC News with RavenDB

•  Very fast page load required

•  “Instant” publish time required

•  6 to 8 code deployments each day

•  High availability: zero* downtime allowed

One of the largest US news sites

Page 11: Delivering big content at NBC News with RavenDB

High availability

is when the answer to:

“What’s the longest outage

before you wind up

in your boss’s office?”

is < 5 seconds.

Page 12: Delivering big content at NBC News with RavenDB

Credit: Mitch Canter @studionashvegas http://twitpic.com/z13bw

Page 13: Delivering big content at NBC News with RavenDB

•  Rolling deployments and rollbacks.

•  Apps and services decoupled physically and temporally.

•  Designed for both auto-failover/recovery and manual reconfiguration by ops.

•  Seamless scale out by adding instances of any process.

•  And more…

Some prerequisites for HA

Page 14: Delivering big content at NBC News with RavenDB

•  Data schema can evolve rapidly

•  Apps shouldn’t know where data is

•  Apps should talk to the closest data replica

•  Apps should automatically find a new replica if the closest becomes unavailable

•  Ops can add/remove replicas quickly and easily, without affecting any running apps

HA data: a private data cloud

Page 15: Delivering big content at NBC News with RavenDB

•  Schema-less document database allows rapid change.

•  Fully ACID model fit business needs.

•  Strong replication functionality supported HA needs.

•  Easily customizable on both client and server.

•  Easily deployed and managed.

•  First class .NET client.

Why we chose RavenDB

Page 16: Delivering big content at NBC News with RavenDB

•  Raven used behind:

•  NBC News and TODAY apps: Windows 8, iOS,

Android, Windows Phone, XBox, Roku.

•  Growing number of sections of nbcnews.com and

today.com.

•  Raven usage stats:

•  ~10 million docs, +1000s of new docs/day.

•  10s of writes/sec.

•  100s of reads/sec (after 3 layers of caching).

Current* state of Raven usage

Page 17: Delivering big content at NBC News with RavenDB

The details

Page 18: Delivering big content at NBC News with RavenDB

•  Each doc cached as long as memory available.

•  Requests include If-Modified-Since header.

•  304 Not Modified response saves bandwidth.

•  Aggressive caching avoids the round-trip. Tunable by ops at runtime (custom).

Client-side caching

Page 19: Delivering big content at NBC News with RavenDB

•  You define sharding strategy – a method.

•  Raven manages storing each doc to the correct instance and fanning/merging queries.

•  No auto-rebalancing of shards if you change number of instances.

Raven sharding

Page 20: Delivering big content at NBC News with RavenDB

•  All queries are performed against indexes. •  Indexes can be predefined or auto-created. •  Indexing/queries are executed in Lucene.NET.

•  Fielded. •  Full text with built-in or custom analyzers. •  Geo-spatial. •  Map-reduce. •  Result transformers can load other docs.

•  Query with LINQ or Lucene syntax. •  Indexes may be stale. Can force wait for non-stale results.

(Danger! Primarily for unit tests.) •  Projections occur on server, reducing data on the wire. •  Super-cool stuff: eval patching, index scripts.

Raven indexing and querying

Page 21: Delivering big content at NBC News with RavenDB

•  Need indexes up to date before letting a client talk to a replica.

•  Indexes are created by the client app:

•  Static: CreateIndexes() at startup scans assemblies for index classes.

•  Dynamic: when client issues a query.

Indexing catch-22

Page 22: Delivering big content at NBC News with RavenDB

•  Define new index, with no code using it.

•  Deploy and allow new index to build.

•  Redeploy with code using the new index.

•  Redeploy after deleting old index definition.

•  Delete old index on each replica.

Updating a static index – a pain

Page 23: Delivering big content at NBC News with RavenDB

•  If you do it by Id, it is consistent (within a single Raven server)

•  Load() •  Store() •  Delete()

•  Queries are only eventually consistent (“eventually” is measured in milliseconds)

Consistency

Page 24: Delivering big content at NBC News with RavenDB

•  Eventual consistency – replication is async in background.

•  All replication is one-way and managed by source.

•  Can enable transitive replication – useful for new instances.

•  Set W value to ensure replication to minimum number of instances (v2.5). Or timeout.

•  Client will auto-failover to replication destinations, configurable to reads only or reads and writes.

Raven replication

Page 25: Delivering big content at NBC News with RavenDB

•  Sequential guids.

•  Unique for every write to a database.

•  Used for caching in client, concurrency control, and replication.

Etags

Page 26: Delivering big content at NBC News with RavenDB

Source: What’s the last etag I replicated to you?

Destination: 42

Source: I’m up to 49, so here’s a POST with some docs in it.

Destination: Got ‘em.

Source: What’s the last etag I replicated to you?

Destination: 49

The replication conversation

Page 27: Delivering big content at NBC News with RavenDB

•  Replication from each instance to all other instances.

•  Any instance could receive writes.

•  Reduce replication conflicts by forcing writes to single “master”.

•  Handle conflicts in your app or with custom server bundle – in our case, “last in wins” bundle.

Multi-master replication

Page 28: Delivering big content at NBC News with RavenDB

•  Null Id and tag can be extracted: client generates with Hi-Lo

•  Null Id received at server: guid

•  Id ending in / received at server: append auto-increment integer.

•  Otherwise: use the value in the object.

•  Server prefix protects against edge-case failures.

Id generation

Page 29: Delivering big content at NBC News with RavenDB

•  Control where reads and writes go. Implemented in a custom DocumentStore wrapper.

•  Control aggressive caching time.

•  Deploy new instances with replication.

•  Backup – but probably never restore in production.

•  Copy indexes.

•  Monitor with stats endpoints.

Raven operations tasks

Page 30: Delivering big content at NBC News with RavenDB

•  Modeling/versioning

•  Replication

•  Client failover

•  Consistency

Keep in mind…

•  Concurrency control

•  Indexing and updates

•  Id generation

•  Caching

Page 31: Delivering big content at NBC News with RavenDB

•  http://ravendb.net

•  GitHub: http://github.com/ravendb

•  Ayende’s blog: http://ayende.com

•  RavenDB Google group •  @RavenDB on Twitter

•  Me: @jtbennett on Twitter

More info on Raven

Page 32: Delivering big content at NBC News with RavenDB

Questions?

Page 33: Delivering big content at NBC News with RavenDB

Many thanks to:

You.

NoSql NOW!

Huge.

Rhinos: @ayende, @synhershko.

Peacocks: @benlakey, @johncoder, @pkdotnet,

Colin Hicks, Peter Durham, Bryan Wheeler.

Page 34: Delivering big content at NBC News with RavenDB

hugeinc.com [email protected] 45 Main St. #220 Brooklyn, NY 11201 +1 718 625 4843