An introduction to NoSQL Radu Potop
Dec 02, 2014
An introduction toNoSQL
Radu Potop
NoSQL
● umbrella term● non-relational data storage● no fixed table schemas● a fresh take on the database technology
Relational databases have issues in handling big volumes of data
Some companies and their databases:● Digg.com - 3 TB for green badges● Facebook - 50 TB for inbox search● eBay - 2 P(eta)B in total
Issues
● horizontal scalability● server performance● rigid schemas● distribution across servers
Characteristics of NoSQL
● no ACID guarantees (Atomicity, Consistency, Isolation, Durability)● highly distributed● scalable● better performance - they don't have to handle relations
NoSQL databases examples:
● Google Bigtable (used intensively by almost everything made by Google)● Amazon Dynamo (used by Amazon S3)● Facebook Cassandra● Apache HBase● LinkedIn Voldemort
Some types of databases:
● Document Oriented databases● JSON format, XML databases● examples: CouchDB, BaseX
● Key - Value pairs databases● values can be more than strings (set of strings)
● examples: Redis, Cassandra
CouchDB
● created by the Apache Foundation● written in Erlang● open source● document oriented database● stores data as JSON documents collection
● queried via REST API● JavaScript is the default language● also supported:
PHP, Ruby, Python and Erlang● built-in replication features● used by Ubuntu One
JSON document{"_id" : "fc5e038d38a570","_rev" : "D546012",
"to" : "email@example","subject" : "helloWorld","body" : "some text"
}
Operations with these documents
● HTTP requests:● GET (select), POST (create), PUT (update), DELETE (delete).
● HTTP AUTH● Aplications: curl, Futon● JavaScript● any application that knows HTTP requests
Futon interface
Redis
● key - value database● written in C● open source● networked● in-memory● persistent database● similar to memcached● data is non-volatile
● atomic operations● very high performance
~100.000 operations/secondby 50 parallel clients
● all data is kept in memory - blazing fast● periodic synchronization to hard-drive● powerful replication
● bindings for a lot of languages: PHP, Ruby, Python, C, Java, etc.
SET foo barGET foo => bar
SET - insertGET - select
Key - value based databases became very popular lately
Other key-value databases:● Facebook's Cassandra (now also used by Digg)● GM.T● MemcacheDB (a persistence enabled variant of memcached)● LinkedIn Voldemort
Conclusion
● relational databases are not the holy grail of data storage● scalability issues determined large corporations to look to other solutions● don't believe the FUD and give them a try
Thank you