Top Banner
Microsoft Technologies for Data Science Mark Tabladillo, Ph.D. Solution Architect (Data Scientist) Microsoft August 2016: SQL Saturday Columbus GA
52

Microsoft Data Science Technologies 201608

Apr 21, 2017

Download

Data & Analytics

Mark Tabladillo
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Microsoft Data Science Technologies 201608

Microsoft Technologies for Data Science

Mark Tabladillo, Ph.D.

Solution Architect (Data Scientist)

Microsoft

August 2016: SQL Saturday Columbus GA

Page 2: Microsoft Data Science Technologies 201608

Networking

Interactive

Page 3: Microsoft Data Science Technologies 201608
Page 4: Microsoft Data Science Technologies 201608
Page 5: Microsoft Data Science Technologies 201608

Terms Definition

Data Science

Machine Learning

Data Mining

Applied Statistics

the automated or semi-

automated process of

discovering patterns in

data

Applied scientific method

Page 6: Microsoft Data Science Technologies 201608

http://www.kdnuggets.com/polls/2015/analytics-

data-mining-data-science-software-used.html

http://products.office.com/en-us/excel

http://www.microsoft.com/en-us/server-cloud/products/sql-server/

http://pytools.codeplex.com/

http://azure.microsoft.com/en-us/services/hdinsight/

http://www.revolutionanalytics.com/

Page 7: Microsoft Data Science Technologies 201608
Page 8: Microsoft Data Science Technologies 201608
Page 9: Microsoft Data Science Technologies 201608
Page 10: Microsoft Data Science Technologies 201608

Technology Choices

SQL SERVER ANALYSIS SERVICES Enterprise

Business Intelligence

EXCEL ADD-IN FOR SSAS Office 365

Office 2013 or Higher x64

SEMANTIC SEARCH Enterprise

Business Intelligence

Standard

Web

Express with Advanced Services

MICROSOFT AZURE ML Free (Size Limited)

Paid (Web Service): Experiment + Query

F# Open Source

SQL SERVER R SERVICES SQL Server 2016 or higher

Page 11: Microsoft Data Science Technologies 201608
Page 12: Microsoft Data Science Technologies 201608

http://download.microsoft.com/download/F/C/2/FC21C981-

4351-4434-A78A-

3384CA7515BF/SQL_Server_2016_Deeper_Insights_Across_D

ata_White_Paper.pdf

Page 13: Microsoft Data Science Technologies 201608

SS

SQL

AS

NoSQL

Page 14: Microsoft Data Science Technologies 201608
Page 15: Microsoft Data Science Technologies 201608

Data mining add-in for business analysts

• Ease of use

• Rich data mining

• Scalable

Page 16: Microsoft Data Science Technologies 201608
Page 17: Microsoft Data Science Technologies 201608
Page 18: Microsoft Data Science Technologies 201608
Page 19: Microsoft Data Science Technologies 201608

Rowset

Output

with Scores

Varchar

NVarchar

Office

PDF

Page 20: Microsoft Data Science Technologies 201608

Documents

Full-Text

Keyword

Index

“FTI”

iFilters

Semantic Document

Similarity Index “DSI”

Semantic

Database

Semantic

Key Phrase

Index –

Tag Index

“TI”

Page 21: Microsoft Data Science Technologies 201608

Simplified Chinese

British English

Portuguese

Chinese (Hong Kong SAR, PRC)

Spanish

Chinese (Singapore)

Chinese (Macau SAR)

Page 22: Microsoft Data Science Technologies 201608

Time in Seconds vs. Number of Documents

(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)

http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf

Page 23: Microsoft Data Science Technologies 201608
Page 24: Microsoft Data Science Technologies 201608
Page 25: Microsoft Data Science Technologies 201608
Page 26: Microsoft Data Science Technologies 201608
Page 27: Microsoft Data Science Technologies 201608

FeaturesMicrosoft R Open

R Distribution (Free)

Microsoft R Client

Free

Microsoft R Server

Commercial

Big Data

In-memory bound

Can only process datasets that fit

into the available memory

In-memory bound

Can process datasets that fit into the available

memory

Operates on large volumes when connected

to R Server

Disk scalability

Operates on bigger volumes &

factors

Speed of

Analysis

Multi-threaded when MKL is

installed for non-ScaleR functions

Multi-threaded with MKL for non-ScaleR

functions

Up to 2 threads for ScaleR functions with a

local compute context

Full parallel threading &

processing

Enterprise

ReadinessCommunity support Community support Commercial support

Analytic

Breadth

& Depth

8000+ open source packagesLeverage & optimize open source R packages

plus 'Big Data'-ready ScaleR packages

Leverage & optimize open source

R packages plus 'Big Data'-ready

+ Multithreaded ready ScaleR

packages

Commercial

Viability

Risk of deployment to open

sourceFree for everyone Commercial licenses

DeployR

EnterpriseNot available Not available Included

Page 28: Microsoft Data Science Technologies 201608

Microsoft R Server Editions Description Install ScaleR Get Started

R Server for Hadoop

Scale your analysis transparently

by distributing work across

nodes without complex

programming

Doc Doc

R Server for Teradata DB

Run advanced analytics in-

database for seamless data

analysis

Doc Doc

R Server for Linux

Bring predictive and prescriptive

analytics power to your Linux

environments

Doc Doc

Page 29: Microsoft Data Science Technologies 201608

http://datacamp.com

Page 30: Microsoft Data Science Technologies 201608
Page 31: Microsoft Data Science Technologies 201608
Page 32: Microsoft Data Science Technologies 201608
Page 33: Microsoft Data Science Technologies 201608

Mutable Immutable

Classic Open

Source

Java Scala

.NETNow Open Source

C#, C++,

VB.NET

F#

Page 34: Microsoft Data Science Technologies 201608
Page 35: Microsoft Data Science Technologies 201608
Page 36: Microsoft Data Science Technologies 201608
Page 37: Microsoft Data Science Technologies 201608
Page 38: Microsoft Data Science Technologies 201608

https://www.microsoft.com/en-us/cloud-platform/what-is-cortana-intelligence-suite

Page 39: Microsoft Data Science Technologies 201608

Capabilities Products

Preconfigured solutions •Business scenarios •Forecasting, churn, etc.

Intelligence

•Integration with Cortana

•Bot services

•Cognitive services

•Cortana

•Bot Framework

•Cognitive Services

Dashboards and visualizations •Dashboards and visualizations •Power BI

Machine learning and advanced

analytics

•Machine learning

•Hadoop

•Distributed analytics

•Complex event processing

•Machine Learning

•HDInsight (Data Lake service)

•Data Lake analytics

•Stream Analytics

Big data stores•Big Data repository

•Elastic data warehouse

•Data Lake store, Blobs

•SQL Data Warehouse

Information management

•Data orchestration

•Data catalog

•Event ingestion

•Data Factory

•Data catalog

•Event Hubs

Page 40: Microsoft Data Science Technologies 201608

https://github.com/jakevdp/sklearn_pycon2015

Page 41: Microsoft Data Science Technologies 201608
Page 42: Microsoft Data Science Technologies 201608

http://www.bing.com/explore/predicts

Page 43: Microsoft Data Science Technologies 201608

https://techcrunch.com/2016/07/07/microsoft-now-helps-businesses-use-the-data-that-powers-bing-predicts/

Page 44: Microsoft Data Science Technologies 201608
Page 45: Microsoft Data Science Technologies 201608

https://academy.microsoft.com/en-US/professional-degree/data-science/

https://borntolearn.mslearn.net/b/weblog/posts/announcing-the-microsoft-professional-degree-mpd-program

Page 46: Microsoft Data Science Technologies 201608

http://www.kdnuggets.com/2015/09/free-data-science-books.html

Page 47: Microsoft Data Science Technologies 201608

https://channel9.msdn.com/Blogs/Windows-Azure

https://mva.microsoft.com/

Page 48: Microsoft Data Science Technologies 201608

http://blogs.technet.com/b/machinelearning/

http://social.msdn.microsoft.com/forums/azure/en-US/home?forum=MachineLearning

http://sqlserverdatamining.com

http://marktab.net

http://curah.microsoft.com/342704/azure-machine-learning-videos-february-2015

Page 49: Microsoft Data Science Technologies 201608

http://datascience.sqlpass.org/

https://www.youtube.com/channel/UCqB3xWdwjA9soFV6EOu7qfg

Page 50: Microsoft Data Science Technologies 201608
Page 51: Microsoft Data Science Technologies 201608
Page 52: Microsoft Data Science Technologies 201608