A Comparison of SQL A Comparison of SQL and and NoSQL NoSQL Databases Databases Keith W. Hare Keith W. Hare JCC Consulting, Inc. JCC Consulting, Inc. Convenor, ISO/IEC JTC1 SC32 WG3 Convenor, ISO/IEC JTC1 SC32 WG3 13 May 2011 Metadata Open Forum 1 ISO/IEC JTC1/SC32/WG2 N1537
41
Embed
AA CompComparariissoonn ooff SSQLQL and NoSQLNoSQL …metadata-standards.org/Document-library/Documents-by-number/WG2... · AA CompComparariissoonn ooff SSQLQL and NoSQLNoSQL Databases
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Comparison of SQLA Comparison of SQLandand NoSQLNoSQL DatabasesDatabases
Keith W. HareKeith W. HareJCC Consulting, Inc.JCC Consulting, Inc.
NoSQLNoSQL databases (either nodatabases (either no--SQL or Not OnlySQL or Not OnlySQL) are currently a hot topic in some parts ofSQL) are currently a hot topic in some parts ofcomputing. In fact, one website lists over acomputing. In fact, one website lists over ahundred differenthundred different NoSQLNoSQL databases.databases.
This presentation reviews the features common toThis presentation reviews the features common tothethe NoSQLNoSQL databases and compares those featuresdatabases and compares those featuresto the features and capabilities of SQL databases.to the features and capabilities of SQL databases.
13 May 2011 Metadata Open Forum 2
Who Am I?Who Am I?
Muskingum College, 1980, BS in Biology andMuskingum College, 1980, BS in Biology andComputer ScienceComputer Science
Senior Consultant with JCC Consulting, Inc.Senior Consultant with JCC Consulting, Inc.since 1985since 1985 –– high performance database systemshigh performance database systems
Ohio StateOhio State –– Masters in Computer &Masters in Computer &Information Science, 1985Information Science, 1985
SQL Standards committees since 1988SQL Standards committees since 1988
Vice Chair, INCITS H2 since 2003Vice Chair, INCITS H2 since 2003
Explicit transaction controlExplicit transaction control
13 May 2011 Metadata Open Forum 8
Data Definition LanguageData Definition Language SchemaSchema defineddefined at the startat the start CreateCreate Table (Column1 Datatype1, Column2Table (Column1 Datatype1, Column2 DatatypeDatatype
2, …)2, …) Constraints to define and enforce relationshipsConstraints to define and enforce relationships
Triggers to respond to Insert, Update , & DeleteTriggers to respond to Insert, Update , & Delete Stored ModulesStored Modules Alter …Alter … Drop …Drop … Security and Access ControlSecurity and Access Control
AAtomictomic –– All of the work in a transaction completesAll of the work in a transaction completes(commit) or none of it completes(commit) or none of it completes
CConsistentonsistent –– A transaction transforms the databaseA transaction transforms the databasefrom one consistent state to another consistentfrom one consistent state to another consistentstate. Consistency is defined in terms of constraints.state. Consistency is defined in terms of constraints.
IIsolatedsolated –– The results of any changes made during aThe results of any changes made during atransaction are not visible until the transaction hastransaction are not visible until the transaction hascommitted.committed.
DDurableurable –– The results of a committed transactionThe results of a committed transactionsurvive failuressurvive failures
13 May 2011 Metadata Open Forum 10
SQL Database ExamplesSQL Database Examples
CommercialCommercial IBM DB2IBM DB2
Oracle RDMSOracle RDMS
Microsoft SQL ServerMicrosoft SQL Server
Sybase SQL AnywhereSybase SQL Anywhere
Open Source (with commercial options)Open Source (with commercial options) MySQLMySQL
IngresIngres
Significant portions of theSignificant portions of theworld’s economy use SQL databases!world’s economy use SQL databases!
13 May 2011 Metadata Open Forum 11
NoSQLNoSQL DefinitionDefinition
From www.nosqlFrom www.nosql--database.org:database.org:
NextNext Generation Databases mostly addressing some ofGeneration Databases mostly addressing some ofthe points: beingthe points: being nonnon--relational,relational, distributeddistributed,, openopen--sourcesource andand horizontal scalablehorizontal scalable. The original intention. The original intentionhas beenhas been modern webmodern web--scale databasesscale databases. The. Themovement began early 2009 and is growing rapidly.movement began early 2009 and is growing rapidly.Often more characteristics apply as:Often more characteristics apply as: schemaschema--free,free,easy replication support, simple API, eventuallyeasy replication support, simple API, eventuallyconsistentconsistent // BASEBASE (not ACID), a(not ACID), a huge datahuge dataamountamount, and more., and more.
Scalable replication and distributionScalable replication and distribution Potentially thousands of machinesPotentially thousands of machines Potentially distributed around the worldPotentially distributed around the world
QueriesQueries need to return answers quicklyneed to return answers quickly MostlyMostly query, fewquery, few updatesupdates Asynchronous Inserts & UpdatesAsynchronous Inserts & Updates SchemaSchema--lessless ACIDACID transaction properties are nottransaction properties are not neededneeded –– BASEBASE CAP TheoremCAP Theorem Open source developmentOpen source development
13 May 2011 Metadata Open Forum 14
BASE TransactionsBASE Transactions
AcronymAcronym ccontrived to be the opposite of ACIDontrived to be the opposite of ACID BBasicallyasically AAvailablevailable,,
SSoftoft state,state,
EEventually Consistentventually Consistent
CharacteristicsCharacteristics WeakWeak consistencyconsistency –– stale data OKstale data OK
AvailabilityAvailability firstfirst
BestBest efforteffort
ApproximateApproximate answers OKanswers OK
AggressiveAggressive (optimistic)(optimistic)
SimplerSimpler and fasterand faster
13 May 2011 Metadata Open Forum 15
Brewer’s CAP TheoremBrewer’s CAP Theorem
A distributed system can support only two of theA distributed system can support only two of thefollowing characteristics:following characteristics:
ConsistencyConsistency
AvailabilityAvailability
Partition tolerancePartition tolerance
The slides from Brewer’s July 2000 talk do notThe slides from Brewer’s July 2000 talk do notdefine these characteristics.define these characteristics.
13 May 2011 Metadata Open Forum 16
ConsistencyConsistency
all nodes see the same data at the same timeall nodes see the same data at the same time ––WikipediaWikipedia
client perceives that a set of operations hasclient perceives that a set of operations hasoccurred all at onceoccurred all at once –– PritchettPritchett
More like Atomic in ACID transactionMore like Atomic in ACID transactionpropertiesproperties
13 May 2011 17Metadata Open Forum
AvailabilityAvailability
node failures do not prevent survivors fromnode failures do not prevent survivors fromcontinuing to operatecontinuing to operate –– WikipediaWikipedia
Every operation must terminate inEvery operation must terminate in an intendedan intendedresponseresponse –– PritchettPritchett
13 May 2011 18Metadata Open Forum
Partition TolerancePartition Tolerance
the system continues to operate despite arbitrarythe system continues to operate despite arbitrarymessage lossmessage loss –– WikipediaWikipedia
Operations will complete, even if individualOperations will complete, even if individualcomponents are unavailablecomponents are unavailable –– PritchettPritchett
13 May 2011 19Metadata Open Forum
NoSQLNoSQL Database TypesDatabase Types
DiscussingDiscussing NoSQLNoSQL databases is complicateddatabases is complicatedbecause there are a variety of types:because there are a variety of types:
Column StoreColumn Store –– Each storage block containsEach storage block containsdata from only one columndata from only one column
Document StoreDocument Store –– stores documents made up ofstores documents made up oftagged elementstagged elements
KeyKey--Value StoreValue Store –– Hash table of keysHash table of keys
More efficient than row (or document) store if:More efficient than row (or document) store if:
Multiple row/record/documents are inserted at theMultiple row/record/documents are inserted at thesame time so updates of column blocks can besame time so updates of column blocks can beaggregatedaggregated
Retrievals access only some of the columns in aRetrievals access only some of the columns in arow/record/documentrow/record/document
13 May 2011 Metadata Open Forum 23
NoSQLNoSQL Example: Document StoreExample: Document Store
Technique for indexing andTechnique for indexing and searching large datasearching large datavolumesvolumes
Two Phases, Map and ReduceTwo Phases, Map and Reduce
MapMap
Extract sets of KeyExtract sets of Key--Value pairs from underlying dataValue pairs from underlying data
Potentially in Parallel on multiple machinesPotentially in Parallel on multiple machines
ReduceReduce
Merge and sort sets of KeyMerge and sort sets of Key--Value pairsValue pairs
Results may be useful for other searchesResults may be useful for other searches
13 May 2011 Metadata Open Forum 28
Map ReduceMap Reduce
Map Reduce techniques differ across productsMap Reduce techniques differ across products
Implemented by application developers, not byImplemented by application developers, not byunderlying softwareunderlying software
13 May 2011 Metadata Open Forum 29
Map Reduce PatentMap Reduce PatentGoogle granted US Patent 7,650,331, January 2010Google granted US Patent 7,650,331, January 2010
System and method for efficient largeSystem and method for efficient large--scale data processingscale data processing
AA largelarge--scale data processing system and method includes onescale data processing system and method includes oneor more applicationor more application--independent map modules configured toindependent map modules configured toread input data and to apply at least oneread input data and to apply at least one applicationapplication--specificspecificmap operationmap operation to the input data to produce intermediate datato the input data to produce intermediate datavalues, wherein the map operation is automatically parallelizedvalues, wherein the map operation is automatically parallelizedacross multiple processors in the parallel processingacross multiple processors in the parallel processingenvironment. A plurality of intermediate data structures areenvironment. A plurality of intermediate data structures areused to store the intermediate data values. One or moreused to store the intermediate data values. One or moreapplicationapplication--independent reduce modules are configured toindependent reduce modules are configured toretrieve the intermediate data values and to apply at least oneretrieve the intermediate data values and to apply at least oneapplicationapplication--specific reduce operationspecific reduce operation to the intermediateto the intermediatedata values to provide output data.data values to provide output data.
13 May 2011 Metadata Open Forum 30
Storing and Modifying DataStoring and Modifying Data
Syntax variesSyntax varies
HTMLHTML
Java ScriptJava Script
Etc.Etc.
AsynchronousAsynchronous –– Inserts and updates do not waitInserts and updates do not waitfor confirmationfor confirmation
VersionedVersioned
Optimistic ConcurrencyOptimistic Concurrency
13 May 2011 Metadata Open Forum 31
Retrieving DataRetrieving Data
Syntax VariesSyntax Varies
No setNo set--based query languagebased query language
Procedural program languages such as Java, C, etc.Procedural program languages such as Java, C, etc.
PODC Keynote, July 19, 2000.PODC Keynote, July 19, 2000. Towards RobustTowards Robust.. Distributed SystemsDistributed Systems..Dr. Eric A.Dr. Eric A. BrewerBrewer. Professor, UC Berkeley. Co. Professor, UC Berkeley. Co--Founder & ChiefFounder & ChiefScientist,Scientist, InktomiInktomi ..www.eecs.berkeley.edu/~www.eecs.berkeley.edu/~brewerbrewer/cs262b/cs262b--2004/PODC2004/PODC--keynote.pdfkeynote.pdf
“Brewer's CAP Theorem” posted by Julian Browne, January 11,“Brewer's CAP Theorem” posted by Julian Browne, January 11,2009.2009. http://www.julianbrowne.com/article/viewer/brewershttp://www.julianbrowne.com/article/viewer/brewers--capcap--theoremtheorem
“How to write a CV” Geek & Poke Cartoon“How to write a CV” Geek & Poke Cartoonhttp://geekandpoke.typepad.com/geekandpoke/2011/01/nosqlhttp://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html.html
Web ReferencesWeb References “Exploring“Exploring CouchDBCouchDB: A document: A document--oriented database for Weboriented database for Web
applications”, Joe Lennon, Software developer, Coreapplications”, Joe Lennon, Software developer, CoreInternational.International.http://www.ibm.com/developerworks/opensource/library/oshttp://www.ibm.com/developerworks/opensource/library/os--couchdb/index.htmlcouchdb/index.html
“Graph Databases, NOSQL and Neo4j” Posted by Peter“Graph Databases, NOSQL and Neo4j” Posted by PeterNeubauerNeubauer on May 12, 2010on May 12, 2010 at:at:http://www.infoq.com/articles/graphhttp://www.infoq.com/articles/graph--nosqlnosql--neo4jneo4j
“Distinguishing Two Major Types of Column“Distinguishing Two Major Types of Column--Stores” Posted byStores” Posted byDanielDaniel AbadiAbadi onMarchonMarch 29, 201029, 2010http://dbmsmusings.blogspot.com/2010/03/distinguishinghttp://dbmsmusings.blogspot.com/2010/03/distinguishing--twotwo--majormajor--typestypes--of_29.htmlof_29.html
““MapReduceMapReduce: Simplified Data Processing on Large: Simplified Data Processing on Large Clusters”,Clusters”,JeffreyJeffrey Dean and SanjayDean and Sanjay GhemawatGhemawat, December 2004., December 2004.http://http://labs.google.com/papers/mapreduce.htmllabs.google.com/papers/mapreduce.html
“Scalable SQL”, ACM Queue, Michael Rys, April 19, 2011“Scalable SQL”, ACM Queue, Michael Rys, April 19, 2011http://queue.acm.org/detail.cfm?id=1971597http://queue.acm.org/detail.cfm?id=1971597
“a practical guide to“a practical guide to noSQLnoSQL”, Posted by Denise Miura on March”, Posted by Denise Miura on March17, 2011 at17, 2011 at http://blogs.marklogic.com/2011/03/17/ahttp://blogs.marklogic.com/2011/03/17/a--practicalpractical--guideguide--toto--nosql/nosql/
““CouchDBCouchDB The Definitive GuideThe Definitive Guide”, J. Chris Anderson, Jan”, J. Chris Anderson, Jan LehnardtLehnardtand Noah Slater. O’Reilly Media Inc.,and Noah Slater. O’Reilly Media Inc., SebastopoolSebastopool, CA, USA., CA, USA.20102010
““HadoopHadoop The Definitive GuideThe Definitive Guide”, Tom White.”, Tom White. O’Reilly Media Inc.,O’Reilly Media Inc.,SebastopoolSebastopool, CA, USA., CA, USA. 20112011
““MongoDBMongoDB The Definitive GuideThe Definitive Guide”, Kristina”, Kristina ChodorowChodorow andandMichaelMichael DirolfDirolf.. O’Reilly Media Inc.,O’Reilly Media Inc., SebastopoolSebastopool, CA, USA., CA, USA.20102010