Lynn Langit Practioner, Author, Instructor BI/Big Data Futures – the Cloud? “psst…it’s about Data Mining” Jan 2012- for SoCalCodeCamp
May 28, 2015
Lynn LangitPractioner, Author, Instructor
BI/Big Data Futures – the Cloud?“psst…it’s about Data Mining”
Jan 2012- for SoCalCodeCamp
BI = ‘Current State’ Questions
• What did we sell?• When did we sell it?• Where did we sell it?• What did we sell with it?
Collecting Transactional
data
Current State
• I define my OLAP • Maybe it’s a read-only copy of my OLTP –OR-• Maybe it’s a cube• Maybe it uses some data mining too
Let’s all OLAP
• I’ll keep it on premises• I’ll secure it, tune queries, back it up, etc.It’s my data
• Too difficult, expensive, proprietary….
Data Mining really?
Do you use Data Mining?
Current State Questions
Why did this happen?
When did this happen?
Where did this happen?
Who is responsible?
What might happen to this one value in the future?
Can you write me a report for…?
BI Data Landscape
StorageProcessing
Query
Presentation
Mix-in #1 -- the Cloud and…
• Host Data in the Cloud• Process & Query Data in the Cloud– Click to query and (data) mine– Return the data locally– Use Self-service BI visualizers
• Mash-up Cloud data – Combine with local data
NoSQL and Cloud-based BI
• The Elephant in the room…Hadoop• Over 120+ types of noSQL databases– http://nosql-database.org/
Oracle Loader for Hadoop
SQL Server Connector for Hadoop
Hadoop on Azure
Hadoop on Azure
Comparing RDBMS and MapReduce
Traditional RDBMS MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Reference: Tom White’s Hadoop: The Definitive Guide
Microsoft Cloud Data I
Microsoft Cloud Data 2 -DataMarket
Amazon AWS
Amazon AWS
Google App Engine Data
Google – MySQL & Cloud Storage
BTW…NoSQL is 50x CHEAPER
BigData = ‘Next State’ Questions
• What could happen?• Why didn’t this happen?• When will the next new thing
happen?• What will the next new thing
be?
Collecting behavioral
data
Splunk
Mining Log Files
Presenting the results
Mix-in #2 - Data Scientists
• Who prepares (processes and cleans) the data?• Who asks the ‘right’ questions now? • Who understands the languages? • Who can understand the results?
Is Data Science your next Career?
Becoming a Data Scientist
• Conferences– Strata – Data Scientist
Summit– CloudCamps
• Practice– here
Hadoop -- HortonWorks, Cloudera…
Google – Freebase & Refine
Microsoft – Data Explorer
Mix-in #3 - Presentation
• New Devices – iPad, Kindle Fire• New User Experiences – touch, Kinect• EVERYTHING on the phone
Some BI Query Languages
Microsoft
• MDX, DMX, T-SQL, DAX, XMLA -- data• Infer.Net --programmer
Open Sour
ce
• R, Hive (SQL-like) –data• HQL, GQL, MQL – specialized data• MapReduce (Java) --programmer
R-Language
Karmasphere Studio for Amazon Elastic MapReduce
Excel PowerPivot
Power Pivot in action
More PowerPivot
Hadoop Connector to Excel
Self-Service Data Mining Predixion
QlikView
QlikView on iPad
BI vNext for MicrosoftSQL Server 2012 - New BI tools and semantic model• Data Quality Services• Master Data Services• Semantic Search• PowerView
SQL Azure vNext - Federations,
features, max. size increase
Connectors for Hadoop• SQL Server• Excel• Power Pivot
Full BI IN the cloud (SSAS, SSIS, SSRS)• Data Explorer – ETL
for all
BI >BigData ‘To Do ListStore some (more) data on the cloud• Relational and non-relational
Process some data in the cloud• Try data mining• Learn about Data Science
Update your client tools• New UI (touch, gestures)• Click to Query• New form factors (phone, tablet)
Is Data Science your next Career?
Deeper Comparison Chart
www.TeachingKidsProgramming.org
• Do a Recipe Teach a Kid (Ages 10 ++)• Microsoft SmallBasic Free Courseware (recipes)
Keep up with Big Data
Follow me @LynnLangit
RSS my blog www.LynnLangit.com
Hire me• To help build your BI/Big Data solution• To teach your team next gen BI