Top Banner
 Brisk: Truly peer-to-peer Hadoop      srisatish.ambati AT gmail.com   Apache Cassandra/OpenJDK   @srisatish
46

Brisk hadoop june2011_sfjava

Dec 05, 2014

Download

Technology

Brisk: Truly peer-to-peer hadoop
Talk at SFJava

http://bit.ly/jqClhK
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Brisk hadoop june2011_sfjava

   

Brisk: Truly peer­to­peer Hadoop   

  srisatish.ambati AT gmail.com  Apache Cassandra/OpenJDK  @srisatish

Page 2: Brisk hadoop june2011_sfjava

   

Brisk: Hive + Hadoop + Cassandra

@srisatish

Page 3: Brisk hadoop june2011_sfjava

   

Map Reduce

@srisatish

Page 4: Brisk hadoop june2011_sfjava

   

Have large sets of data & you can work on small pieces in parallel. 

@srisatish

Page 5: Brisk hadoop june2011_sfjava

   Map Reduce@srisatish

Page 6: Brisk hadoop june2011_sfjava

   

Multi­core map reduce framework, Kunle, et al

@srisatish

Page 7: Brisk hadoop june2011_sfjava

   

Parallel Execution View @srisatish

Page 8: Brisk hadoop june2011_sfjava

   

@srisatish

Page 9: Brisk hadoop june2011_sfjava

   

@srisatish

Page 10: Brisk hadoop june2011_sfjava

   

JobTrackerNameNode

HDFS

@srisatish

Page 11: Brisk hadoop june2011_sfjava

   

Write­once­read­many!File once created, written & closed need change

@srisatish

Page 12: Brisk hadoop june2011_sfjava

   

Move computation, not data

@srisatish

Page 13: Brisk hadoop june2011_sfjava

   

@srisatish

Page 14: Brisk hadoop june2011_sfjava

   

DataNodes: Read, Write Blocks

@srisatish

Page 15: Brisk hadoop june2011_sfjava

   

NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure

Page 16: Brisk hadoop june2011_sfjava

   

Enter the Cassandra:High Scale

Peer­to­peer

@srisatish

When “it” does not fit in a single node!… Enter the distributed dragon!

Page 17: Brisk hadoop june2011_sfjava

   

NameNode

DataNodes

Page 18: Brisk hadoop june2011_sfjava

   

One­kind­of­node!

Page 19: Brisk hadoop june2011_sfjava

   

Cassandra:High Scale

Peer­to­peer

@srisatish

Page 20: Brisk hadoop june2011_sfjava

   

Portfolio DemoLow latency

Live tick prices for stocks.Batch Analytics

Historical EOD prices.Value at Risk.

http://www.datastax.com/docs/0.8/brisk/brisk_demo

Page 21: Brisk hadoop june2011_sfjava

   

http://ec2­50­19­4­143.compute­1.amazonaws.com:8888/opscenter/index.htmlhttp://ec2­67­202­12­176.compute­1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201105310219_0008&refresh=30http://ec2­50­19­4­143.compute­1.amazonaws.com:8983/portfolio/

Demo URLs (good for this demo only)

Page 22: Brisk hadoop june2011_sfjava

Bigtable, 2006Dynamo, 2007

OSS, 2008

Incubator, 2009 TLP, 2010

Page 23: Brisk hadoop june2011_sfjava

   

A

LT

W

F

P

YKey “C”

U

Cassandra:High Scale

Peer­to­peerNo SPOF

@srisatish

Page 24: Brisk hadoop june2011_sfjava

“dynamic” columnfamilies

zznate

driftx

thobbs

jbellis

driftx: thobbs:

driftx: thobbs:mdennis: zznate:

Following

zznate:

pcmanus: xedin:

Page 25: Brisk hadoop june2011_sfjava
Page 26: Brisk hadoop june2011_sfjava

   

Page 27: Brisk hadoop june2011_sfjava

   

Page 28: Brisk hadoop june2011_sfjava

   

Brisk

@srisatish

Page 29: Brisk hadoop june2011_sfjava

   

BriskHowStuffWorks version

@srisatish

Page 30: Brisk hadoop june2011_sfjava

   

YDH security edition (soon to be Apache)Apache Hive – Access via SQL likeCassandra 0.8CQL InterfaceApache Thrift

Page 31: Brisk hadoop june2011_sfjava

   

Use ColumnFamiliesinodesblock  

@srisatish

Page 32: Brisk hadoop june2011_sfjava

   

 String keyspace = “cfs”;

CfDef cf = new CfDef();   cf.setName(inodeDefaultCf);   cf.setComparator_type("BytesType");…

      

     cf.setName(sblockDefaultCf);     cf.setKey_cache_size(1M);     cf.setComment( 

"Stores blocks of information associated with a inodeStores blocks of information associated with a inode");

cf.setKeyspace(keyspace);

@srisatish

Page 33: Brisk hadoop june2011_sfjava
Page 34: Brisk hadoop june2011_sfjava

   

Consistency: R + W > N

"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";

@srisatish

Page 35: Brisk hadoop june2011_sfjava

   

Hadoop: job tracker, task tracker

@srisatish

Page 36: Brisk hadoop june2011_sfjava

   

BriskSnitch: brisk nodes, cassandra nodes

@srisatish

Page 37: Brisk hadoop june2011_sfjava

   

BriskSimpleSnitch.java

if(TrackerInitializer.isTrackerNode)     {           myDC = BRISK_DC;          logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC);      } else      {            myDC = CASSANDRA_DC;

logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC);      } @srisatish

Page 38: Brisk hadoop june2011_sfjava

   

Hive: SQL­like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)

@srisatish

Page 39: Brisk hadoop june2011_sfjava

   

hive>  CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);

hive>  LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008­08­15');

hive>  SELECT count(*), ds FROM invites GROUP BY ds;

http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish

Page 40: Brisk hadoop june2011_sfjava

   

ETLReal­time

Cassandra CFsDataCenters

Scale

@srisatish

Page 41: Brisk hadoop june2011_sfjava

   

@srisatish

Page 42: Brisk hadoop june2011_sfjava

   

No me in team!

● Ben Coverston

● Ben Werther

● Brandon Williams

● Cathy Daw

● Daria Hutchinson

● Eric Gilmore

● Jackson Chung

● Jake Luciani

● Joaquin Casares

● Jonathan Ellis

● Michael Allen

● Mike Bulman

● Nate McCall

● Nick M Bailey

● Patricio Echague

● Tyler Hobbs

● SriSatish Ambati

● Yewei Zhang

@srisatish

Page 43: Brisk hadoop june2011_sfjava

   

@srisatish100­node Brisk Cluster on Opscenter

Page 44: Brisk hadoop june2011_sfjava

   

OSS, 2008

+

+ +

Brisk

Cassandra

Incubator 2009

Bigtable, 2006Dynamo, 2007

TLP, 2010

Page 45: Brisk hadoop june2011_sfjava

   

Git started:git clone [email protected]:riptano/brisk.githttp://www.datastax.com/product/briskGetting  Started via Brisk AMI.Thank You. 

@srisatish

Page 46: Brisk hadoop june2011_sfjava

   

References● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and 

Sanjay Ghemawat, http://bit.ly/googmr_pdf

● Multi­core MapReduce, Kunle, et al. http://bit.ly/iRJd1n

@srisatish