Transcript

Inside Wordnik's Architecture

Tony Tam@fehguy

Who is Wordnik?

•Founded in 2008 by Erin McKean

•"Understand meaning of words automatically"

•Patented "Free-Range Definition" technology

•Constructed largest (known) English Word Graph

We do Discovery

It's all about Data!

Data?

•Word Graph is built by data

•Runtime answers needed fast

50M+ Nodes!

80mS reads!

80M+ Edges!

What we do with Data

•Update the Graph constantly

•Augment our NLP pipeline

•"Reality-based Annotation" with current, real-world data

What we do with Data

•Update the Graph constantly

•Augment our NLP pipeline

•"Reality-based Annotation" with current, real-world data

Language is NOT static

What we do with Data

•Update the Graph constantly

•Augment our NLP pipeline

•"Reality-based Annotation" with current, real-world data

Language is NOT static

Twitter?

Tumblr?

Wordpress

Next???

Is a 20 year-old corpus good enough?

How we do it

•Amazon EC2-based deployment

•Efficiency through constraint-based architecture

• Small is Big!

•Horizontal scaling by adding servers!

• Yea, we can always go vertical

•Blah, blah, more details!

Micro Services

•Services are stand-alone building blocks

•Increase capacity through a "more like this" button

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Micro Services

•Big application => micro services

Monolithic application

"Isn't this just SOA?"

Not PO-SOA

•This is different

• No proprietary message bus

• Decoupled objects

• Dedicated storage***

•Speak REST

• Develop your services in…

• Java

• Scala

• Ruby

• Php

Al valid

!

Speak REST?

•Sounds good but…

• REST semantics vary wildly

• HATEOAS vs. practical REST?

/api/pet.json/1?delete (GET)

/api/pet.json/1 (DELETE)

/api/pet.json/1 (POST empty)

So…

All valid

!

Speak REST?

•Sounds good but…

• REST semantics vary wildly

• HATEOAS vs. practical REST?

/api/pet.json/1?delete (GET)

/api/pet.json/1 (DELETE)

/api/pet.json/1 (POST empty)

So…API

Styleguide!

Peer Review!

Better Docs!

API Council!

mSOA makes new Challenges

•It's communication (not easy)

•Need a consumer & provider contract

•Driving force to create Swagger

What is Swagger?

•Swagger is…

• Spec for declaring and documenting an API

• A framework for auto-generating the spec

• A library for client library generation

• A JSON-based test framework

•It's open source!

• http://swagger.wordnik.com

How?

•Swagger Codegen

• Creates a client based on your Swagger Specscala src/main/scala/Codegen.scala \ ${swagger-spec-url}

Scala

Ruby

In the Wordnik Workflow

•Jenkins will…

• Build a service library

• Build a stand-alone application distro

• Build an installable image (RPM)

• Build a compatible client library

•Consumers will…

• Declare dependency on a service version

• Use a client for that version

• Be given a list of compatible services, by cluster, version

Back to Data

•Micro services have small(ish) databases

• Share nothing across services

• YES To replica sets

•Deployed to ephemeral storage

• (more in a bit)

• Small by design

•How to keep them small?

Keeping Databases Small

•Some easy tricks

• Schema-less => "schema per document"

• Keep field names short!

db.foo.save({user_name:"Tony"})

db.foo.save({un:"Tony"})

•Indexes

• They can get *huge*

• Make _id matter!

Repeat 10e9

times!

Keeping Databases Small

•Some easy tricks

• Schema-less => "schema per document"

• Keep field names short!

db.foo.save({user_name:"Tony"})

db.foo.save({un:"Tony"})

•Indexes

• They can get *huge*

• Make _id matter!

Repeat 10e9

times!

Keeping Databases Small

•Don't make _id just an "auto increment"You're stuck with it! Be smart

• User collection? Try _id: username

• Email collection? Try _id: email

• Date-driven collection? How about _id: "20120502"

• db.logins.find({_id:/^201205/}) 17

15

27

Be lazy until you can't anymore!

Keeping Databases Small

•DAO or die!

• Fancy index scheme => control access to collections

NO!!!!

Yes

Keeping Databases Small

•If/when you need to shard…

Don't make your

clients do this!

Keeping Databases Small

•Again, why keep them small?

•Starting a new replica

• Initial sync

• Index rebuilding

•Backups

•Index Compaction

•Speed

•TCO

Keeping Databases Small

•Again, why keep them small?

•Starting a new replica

• Initial sync

• Index rebuilding

•Backups

•Index Compaction

•Speed

•TCO

Everything is

easier

This can take DAYS

Ephemeral Storage?

•Every EC2 instance type has some (except micro)

•Only available via EC2 API

•Less prone to issues than EBS

•Faster ***

•Included in cost of server

Ephemeral Storage?

•Every EC2 instance type has some (except micro)

•Only available via EC2 API

•Less prone to issues than EBS

•Faster ***

•Included in cost of serverBut dies on host reboot!

Keeping Data Safe

Which Zone? Which Region?

Which Zone? Which Region?

Arbiter handles external

connectivity issue

detection

How does this really stack up?

•Tuned indexes & access, split with services

• Was: 3 DAS Devices w/18 TB disk

• Now: 21 M1.large + M1.xlarge instances

• 3 Zones, 2 regions

•The Gory Detailsblog.wordnik.com/with-software-small-is-the-new-big

As for Services

•~1,000 requests/sec via Swagger-enabled micro services

•Direct to Consumer via SwaggerSocket

What's Next

•Migrating all services to SwaggerSocket

• OSS WebSocket subprotocol

https://github.com/wordnik/swaggersocket

• 25%-100% speed increase (sync & async)

•Discovery via Wordnik

If you're Interested…

If you're Interested…

If you're Interested…

If you're Interested…

If you're Interested…

If you're Interested…

If you're Interested…

See more:

developer.wordnik.com

swagger.wordnik.com

github.com/wordnik

Questions?

top related