Brisk hadoop june2011_sfjava

Brisk: Truly peertopeer Hadoop

srisatish.ambati AT gmail.com Apache Cassandra/OpenJDK @srisatish

Brisk: Hive + Hadoop + Cassandra

@srisatish

Map Reduce

@srisatish

Have large sets of data & you can work on small pieces in parallel.

@srisatish

Map Reduce@srisatish

Multicore map reduce framework, Kunle, et al

@srisatish

Parallel Execution View @srisatish

@srisatish

@srisatish

JobTrackerNameNode

HDFS

@srisatish

Writeoncereadmany!File once created, written & closed need change

@srisatish

Move computation, not data

@srisatish

@srisatish

DataNodes: Read, Write Blocks

@srisatish

NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure

Enter the Cassandra:High Scale

Peertopeer

@srisatish

When “it” does not fit in a single node!… Enter the distributed dragon!

NameNode

DataNodes

Onekindofnode!

Cassandra:High Scale

Peertopeer

@srisatish

Portfolio DemoLow latency

Live tick prices for stocks.Batch Analytics

Historical EOD prices.Value at Risk.

http://www.datastax.com/docs/0.8/brisk/brisk_demo

http://ec250194143.compute1.amazonaws.com:8888/opscenter/index.htmlhttp://ec26720212176.compute1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201105310219_0008&refresh=30http://ec250194143.compute1.amazonaws.com:8983/portfolio/

Demo URLs (good for this demo only)

http://ec2-50-19-4-143.compute-1.amazonaws.com:8888/opscenter/index.html

http://ec2-67-202-12-176.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201105310219_0008&refresh=30

http://ec2-50-19-4-143.compute-1.amazonaws.com:8983/portfolio/

Bigtable, 2006Dynamo, 2007

OSS, 2008

Incubator, 2009 TLP, 2010

A

LT

W

F

P

YKey “C”

U

Cassandra:High Scale

PeertopeerNo SPOF

@srisatish

“dynamic” columnfamilies

zznate

driftx

thobbs

jbellis

driftx: thobbs:

driftx: thobbs:mdennis: zznate:

Following

zznate:

pcmanus: xedin:

Brisk

@srisatish

BriskHowStuffWorks version

@srisatish

YDH security edition (soon to be Apache)Apache Hive – Access via SQL likeCassandra 0.8CQL InterfaceApache Thrift

Use ColumnFamiliesinodesblock

@srisatish

String keyspace = “cfs”;

CfDef cf = new CfDef(); cf.setName(inodeDefaultCf); cf.setComparator_type("BytesType");…

cf.setName(sblockDefaultCf); cf.setKey_cache_size(1M); cf.setComment(

"Stores blocks of information associated with a inodeStores blocks of information associated with a inode");

cf.setKeyspace(keyspace);

@srisatish

Consistency: R + W > N

"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";

@srisatish

Hadoop: job tracker, task tracker

@srisatish

BriskSnitch: brisk nodes, cassandra nodes

@srisatish

BriskSimpleSnitch.java

if(TrackerInitializer.isTrackerNode) { myDC = BRISK_DC; logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC); } else { myDC = CASSANDRA_DC;

logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC); } @srisatish

Hive: SQLlike accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)

@srisatish

hive> CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);

hive> LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='20080815');

hive> SELECT count(*), ds FROM invites GROUP BY ds;

http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish

ETLRealtime

Cassandra CFsDataCenters

Scale

@srisatish

@srisatish

No me in team!

● Ben Coverston

● Ben Werther

● Brandon Williams

● Cathy Daw

● Daria Hutchinson

● Eric Gilmore

● Jackson Chung

● Jake Luciani

● Joaquin Casares

● Jonathan Ellis

● Michael Allen

● Mike Bulman

● Nate McCall

● Nick M Bailey

● Patricio Echague

● Tyler Hobbs

● SriSatish Ambati

● Yewei Zhang

@srisatish

@srisatish100node Brisk Cluster on Opscenter

OSS, 2008

+

+ +

Brisk

Cassandra

Incubator 2009

Bigtable, 2006Dynamo, 2007

TLP, 2010

Git started:git clone [email protected]:riptano/brisk.githttp://www.datastax.com/product/briskGetting Started via Brisk AMI.Thank You.

@srisatish

mailto:[email protected]

References● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and

Sanjay Ghemawat, http://bit.ly/googmr_pdf

● Multicore MapReduce, Kunle, et al. http://bit.ly/iRJd1n

@srisatish

http://bit.ly/googmr_pdf

Brisk hadoop june2011_sfjava

Technology