Data Analytics using the Cloud - Challenges and Opportunities for India
Jan 27, 2015
Data Analytics using the Cloud
- Challenges and Opportunities for India
Introduction
AJAY OHRI Author 1,2,3 Thinker 1,2
Founder, DECISIONSTATS [email protected] http://linkedin.com/in/ajayohri
What comes next?
Data Analytics- Older ParadigmsThoughts on Stats and Computer Science
Overview - Data Storage, Cloud Computing
Data Analytics
old (er) paradigms - SAS and SPSS languages, ETL and DWs
newer paradigms - R and Python, Scala and Hadoop
More machine learning, less classical stats
Is statistics lagging behind computer science
Classical statistics- too few data
Big Data era- cost of throwing data is more than cost of storing it
Machine learning - seems to be the flavor
Data Storage
older paradigms - RDBMS and Spreadsheetsstructure and interactivity
new paradigms- NoSQL, Hadoop , cloud enabled spreadsheets(?)
volume, variety, velocity of Big Data
Cloud Computing- defined by NIST
http://www.nist.gov/itl/csd/cloud-102511.cfm
cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction
or
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
Service Models for Cloud Computing
SaaS- Software as a serviceIaaS - Infrastructure as a servicePaaS-Platform as a service
Service Models for Cloud Computing
IaaS - Infrastructure as a servicehttp://media.amazonwebservices.com/IDC_Business_Value_of_AWS_Accelerates_Over_time.pdf
http://www.gartner.com/technology/reprints.do?id=1-1IMDMZ5&ct=130819&st=sb
Service Models for Cloud Computing
PaaS - Platform as a service
http://www.gartner.com/technology/research/cloud-computing/report/paas-cloud.jsp
http://www.forrester.com/search?N=20033+10001&sort=3&everything=true&source=browse&
Service Models for Cloud Computing
SaaS - Software as a service
http://www.forrester.com/Software--as--a--Service-%28SaaS%29
http://www.gartner.com/newsroom/id/1963815
http://www.forbes.com/sites/louiscolumbus/2013/02/19/gartner-predicts-infrastructure-services-will-accelerate-cloud-computing-growth/
http://my.gartner.com/portal/server.pt?open=512&objID=202&&PageID=5553&mode=2&in_hi_userid=2&cached=true&resId=2332215&ref=AnalystProfile
http://www.gartner.com/it-glossary/software-as-a-service-saas/
Deployment Models for Cloud Computing
Private-Community-Public-Hybrid-
Data Analytics (traditional) -Porter’s ModelThreat of Mobility- Low (Lockin)Industry Rivalry- Medium (Many)Supplier Power- High(S/w, H/W)Buyer Power- MediumSubstitutes- Low (Not manyalternatives to SAS, SPSS)
Data Analytics (cloud based) -Porter’s ModelThreat of Mobility- High (Easy switch as data and analytics is cloud based)Industry Rivalry- High( Global providers)Supplier Power- Low (open source,free , GPL)Buyer Power -High (lots of optionsoutsource, insource,crowd source)Substitutes- High (lots of optionsPython, R , Julia etc)
Data Analytics in India - Porter’s Diamond ModelChance- Favorable supply of engineers, Mature outsource and service industry, Rapid growth domesticallyFactor Conditions- Good Service IndustryFirm Strategy- relative lack of ecosystem hampers analytics entrepreneursDemand Conditions- HighGovernment- Little or No interference
India in traditional Data AnalyticsStrengths Weakness
reliable pool of experienced engineering talent
inability or unwillingness to invest in huge upfront capex for hardware and software for analytics
Opportunities Threats
ability to navigate upstream based on cost based arbitrage than skill based value addition thus vulnerable to competition
India in Cloud Based Data AnalyticsStrengths Weakness
experienced service industry with huge pool of trained engineering and analytical talent
lack of deep domain depth
relative lack of ecosystem for cutting edge analytics entrepreneurship
slow to embrace open source
Opportunities Threats
no more capital expenditure needed in software and hardware
virtualization offers secure delivery from any location
risk management needs to be more mature
lack of data privacy regulations
Biggest Challenge to using Cloud Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
Biggest Challenge to using Cloud ==NSA?
Biggest Challenge to using Cloud Google, Amazon,Oracle Cloud, Salesforce, Zoho and Microsoft Azure are some well-known cloud vendors
Most of the cloud infrastructure is based out of United States of America
Unfortunately the USA Govt taps the information for both security as well as economic advantages
Unfortunately American Companies seek and get economic advantages for such cooperation
Unfortunately in the age of cyber war and the biggest proponent across the border, we have no critical infrastructure as a service for economic players
In the future, you wont need United Nations to sanction countries. You just switch off their internet and their economy will shut off.
Foreign digital infrastructure can be used to infiltrate Stuxnet like viruses in the domestic supply chain?
India may be self reliant in agriculture and semi reliant in manufacturing arms, but we are totally dependent on new generation and even current generation computing
Biggest Opportunities to using CloudBuild our critical digital grid using local companies - POSSIBLE
Build our next generation of cyber warriors and cyber farmers - VERY POSSIBLE
Teach more distributed computing earlier ;)
Regulation like EU to ensure Indian Citizen Data stays within Indian State’s administrative boundaries and within reach of Indian legal system
Compare ADHAAR Card with information in emails, social networks, on the personal computer ??
Better regulation - POSSIBLE OR NOT POSSIBLE ---DEPENDS ON ELECTIONS ?
Moving onto Cloud Based Data Analytics
Open Source analytics like Python and R
Support Distributed Computing
Memory is no problem now ( especially for R) on the cloud
Existing Data Analytics in India
Lots of Analytics OutsourcingBoth SAS and SPSS are presentOpen Source Analytics on the rise but still palpable lack of awareness
Data - ETL- Data WareHouse- SQL Query- Stats Software MINDSET
Existing Data Analytics in India
Cloud Computing Explicitly uses Linux for Efficiency
Your Windows CERTIFICATIONS can hinder your IT Department’s mindset on the cloud
Data Science requires cross functional learning
Developments in Stats Software
A New Hope - Julia, Pandashttp://julialang.org/
http://pandas.pydata.org/
The Empire Strikes Back - SAS http://www.sas.com/en_us/software/cloud.html
https://www.sas.com/en_us/software/sas-hadoop.html
Return of the Jedihttp://www.r-bloggers.com/
a few Developments in Analytics
Revolution R on the cloud (AWS)www.revolutionanalytics.com/RRE-AWS
SAS on the cloudhttp://blogs.sas.com/content/sascom/2013/04/29/start-planning-now-for-sas-9-4/
http://www.allanalytics.com/author.asp?section_id=1411&doc_id=262924
Apache Spark and Rhttp://amplab-extras.github.io/SparkR-pkg/
a few Developments on the Cloud
Amazon http://aws.amazon.com/
Google https://cloud.google.com/products/
IBM http://www.ibm.com/cloud-computing/in/en/
Oracle https://cloud.oracle.com/java
a few Developments in R
RHadoop Projecthttps://github.com/RevolutionAnalytics/RHadoop/wiki
OpenCPU Projecthttps://www.opencpu.org/
rOpenSci Projecthttp://blog.programmableweb.com/2013/03/20/pw-interview-karthik-ram-ropensci-wrapping-all-science-apis/
The future of Open CloudR + Python on OpenStack ?
There is a fair degree that Apache Hadoop related projects like Shark / Spark would be there and We need a Hadoop Based Data Warehouse Solutions(?)
We need to hedge for US Policy Interference
Education and developer ecosystems have to keep pace
Thank You