This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Table of contents I
1 Big Data
2 Overview
3 Concepts (part 1)
4 Break
5 Assignment
6 Conclusion
7 References
3/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
What is it?
And, why is it interesting?
Big data has emerged as a technology term and trendthat is complementary to and considered to be equally astransformational as the cloud computing model.. . . represented as an “old” or “new” capability dependingon the perspective of those defining it, . . .
Lee Badger [10]
Big Data can be characterized by the three V’s:volume (large amounts of data), variety (includesdifferent types of data), and velocity (constantlyaccumulating new data).
Jules. J. Berman [3]
4/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Notional definition
We’ll be covering virtually “bleeding edge” stuff.
Data too big for a singlemachine.
Processing too long for asingle machine.
Question/analysis isparalizable.
5/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Where does it come from?
Lots of places, lots of it, and fast.
230,000,000 tweets per day[8]
2,700,000,000 Facebooklikes per day [2]
100 hours of YouTube videoevery minute [14]
Clickstream left on servers
6/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
What does data look like?
Data characteristics
Formatted/unformatted
Bits, bytes, tagged, freeform
Clean, messy
Complete, fragmented
19/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
What does data look like?
Torrents of data
Primary usage
Secondary usage
“Exhaust”
Storage1 Accessability2 Longivity3 Privacy
20/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
What does data look like?
Big data players
Brokers
Scientists
Visionaries
21/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Break time.
Take about 10 minutes.
22/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
A “Hello World” level problem.
With a little license.
A simply stated problem: Countthe number of unique words inShakespeare’s Macbeth.
A few Java classes
A Hadoop environment
Process strings from a file
Summarize the results
Grad students have a little moreto do.
23/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
A “Hello World” level problem.
The Hadoop “cook book”, simple things (on thesurface).
Partition (paralyze) thesource data
Create key value pairs1 Receive line of text2 Parse the text in some
way3 Create key/value pairs
Behind the scenes key valuepairs are combined
Reduce key and multiplevalues
Produce something useful
24/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Undergraduate level problem
Shakespeare’s Macbeth (mechanics)
Things that need to get done:
1 Get a copy of the play
2 Get it onto the HadoopDistributed File System(HDFS)
3 Write and compile a Mapperclass
4 Write and compile aReducer class
5 Write and compile a mainclass
6 Run it on the ODU CSHadoop farm
25/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Undergraduate level problem
Undergrad results: a simple textual listing
Some words are moreimportant than others (stopwords)
Only base works (stems ==stem)
Words sorted alphabetically
Number of occurrences perword
Words don’t havepunctuation
Case insensitive words
26/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
Graduate level problem
Graduate challenges: undergrads worked with one file,graduates with two
How do the vocabularies of Romeo and Juliet, and Macbethcompare?
Slightly more work to do withdata:
Work with two files
Compare first 50 words ofboth plays
1 Order2 Usage (relative not
absolute)
Interested in how the similar the vocabularies are across the twoplays.
27/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
What have we covered?
Big Data VsBig Data sourcesProblems associated with Big DataAssignment #1
Next time: Big Data processing concepts (part deux)
28/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
References I
[1] Divyakant Agrawal, Philip Bernstein, Elisa Bertino, SusanDavidson, Umeshwas Dayal, and Michael Franklin,Challenges and Opportunities with Big Data, Purde e-Pubs(2011).
[2] Anson Alexander,Facebook User Statistics 2012 [Infographic], ansonAlex.com(2012).
[3] Jules J Berman,Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information,Newnes, 2013.
[4] Pinal Dave, Big Data Beginning Big Data Day 2 of 21,http://blog.sqlauthority.com/2013/10/02/, 2013.
29/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
References II
[5] Mike Ferguson,Architecting A Big Data Platform for Analytics, AWhitepaper Prepared for IBM (2012).
[6] Christian Hagen, KHalid Khan, Marco Ciobo, and Jason Miller,Big Data and the Creative Destruction of Today’s Business Models,http://www.atkearney.com/strategic-it/ideas-insights/article/-/asset publisher/LCcgOeS4t85g/content/big-data-and-the-creative-destruction-of-today-s-business-models/10192,2013.
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
References III
[8] Joab Jackson, The Big Promise of Big Data, BusinessSoftware (2012).
[9] Doug Laney, 3d data management: Controlling data volume,velocity and variety, META Group Research Note 6 (2001).
[10] Robert Bohn Lee Badger, David Bernstein,US Government Cloud Computing Technology Roadmap Volume I,Tech. report, National Institute of Standards and Technology,2014.
[11] John DC Little, A Proof for the Queuing Formula: L= λ W,Operations Research 9 (1961), no. 3, 383–387.
31/31
Big Data Overview Concepts (part 1) Break Assignment Conclusion References
References IV
[12] Patrick Meier, Using big data to inform poverty reductionstrategies, http://irevolution.net/2013/06/19/pulse-of-egypt-to-inform-poverty-reduction/,2013.
[13] Philip Russom, Big Data Analytics, TDWI Best PracticesReport, Fourth Quarter (2011).