Top Banner
Microsoft SQL Server 2016 R Services
21

Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Sep 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Microsoft

SQL Server 2016 R Services

Page 2: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Consistent experience from on-premises to cloud

Microsoft Tableau Oracle

$120

$480

$2,230

Self-service BI per user

In-memory across all workloads

built-inbuilt-in built-in built-in built-in

at massive scale

0 14

0 03

34

29

22

15

5

22

6

43

20

69

18

49

3

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6

SQL Server Oracle MySQL SAP HANA TPC-H

Oracle is #5#2

SQL Server

#1

SQL Server

#3

SQL Server

SQL Server 2016: Everything built-in

2

Page 3: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

從資料到決策和行動

價值

資料

$1.6trillion

行動决策

Page 4: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

微軟先進分析產品

Cortana

Analytics Suite

SQL Server 2016

Page 5: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

典型先進分析的生命週期

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

準備 Modeling

投入生產

Page 6: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

資料科學家應該是關注創建/測試模型

Data scientist

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

準備 Modeling

投入生產

Page 7: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

但現實是...

Data scientist focus time

Ingest Transform Explore Model Deploy

Score Visualize Measure

Model

Score

ƒ(x)

準備 Modeling

投入生產

80%

5%

15%

Page 8: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

決定

投入生產

先進分析是一項團隊運動

Preparation

model

Page 9: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

什麼是 R ?

開源“lingua franca”

Analytics, computing, modeling

Global community

Millions of users 7,000+Packages

Big dataEcosystem

Scalability

Page 10: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

CRAN: The Comprehensive R Archive Network

Open Source “lingua franca”

Analytics, Computing, Modeling

In addition to CRAN, Bioconductor, GitHub, and others distribute R packages

Page 11: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

大量人才知道如何使用

為什麼 R ?

可擴充正在進行計算的資料

更容易保護重要的資料

角色使用創建效率

Page 12: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

$?

開源R的挑戰

Uncertain total cost of ownership and return on investment

Integrating R with existing and ever changing data infrastructures

Scale and Performance

Data movement restricts access for efficient data modeling

Page 13: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger

volumes & factors

Speed of

Analysis

Single threaded Parallel threading and Processing Shrinks analysis time

Enterprise

Readiness

Community support Commercial support Delivers full service

production support

Analytic

Breadth &

Depth

7000+ innovative analytic

packages

Leverage and optimize open

source packages plus Big Data

ready packages

Supercharges R

Commercial

Viability

Risk of deployment of open

source

Commercial licenses Eliminates risk with

open source

開源 好處微軟R

微軟R的好處

Page 14: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Faster And More Scalable

Page 15: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Custom parallelization

PEMA-R API

rxDataStep

rxExec

Data step

Data import – Delimited, fixed, SAS, SPSS, OBDC

Variable creation & transformation

Recode variables

Factor variables

Missing value handling

Sort, merge, split

Aggregate by category (means, sums)

Descriptive statistics

Min/max, mean, median (approx.)

Quantiles (approx.)

Standard deviation

Variance

Correlation

Covariance

Sum of squares (cross-product matrix for set variables)

Pairwise cross tabs

Risk ratio & odds ratio

Cross-tabulation of data (standard tables & long form)

Marginal summaries of cross tabulations

Statistical tests

Chi Square Test

Kendall Rank Correlation

Fisher’s Exact Test

Student’s t-Test

Sampling

Subsample (observations & variables)

Random sampling

Predictive models

Sum of squares (cross-product matrix for set variables)

Multiple linear regression

Generalized linear models (GLM) exponential family distributions: binomial,

Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit,

identity, log, logit, probit. User defined distributions & link functions.

Covariance & correlation matrices

Logistic regression

Classification & regression trees

Predictions/scoring for models

Residuals for all models

Simulation

Simulation (e.g., Monte Carlo)

Parallel random number generation

Cluster analysis

K-Means

Classification

Decision trees

Decision forests

Gradient-boosted decision trees

Naïve Bayes

Parallelized, Remote Executing Algorithms

Page 16: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

In-database advanced analytics

Data Scientist

Interacts directly with data

SQL Developer/DBAManage data and

analytics together

ExtensibilityExample solutions

Sales forecasting

Warehouse efficiency

Predictive

maintenance

Credit risk protection

010010

100100

010101

Relational data

Analytics library

T-SQL interface

?R

integration

Built into

SQL Server 2016

010010

100100

010101

Real-time operational analyticswithout moving data

R with in-memory scalability

Page 17: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

rows

min

ute

s

External

Access

In

Database

Page 18: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Flexibility & Agility

寫一次部署在任何地方 No model re-writes across platforms

No re-writes from modeling to scoring

Hybrid modeling & scoring Model on premises, score on premises

Model on premises, score in the cloud

Model on cloud, score on premises

ModelPrepare

SQL

Server

Score

Parallelized Models

Page 19: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

Financial Services Digital Media & Retail

Healthcare & Pharma Government & Academia Analytics Service Providers

Manufacturing & High Tech

微軟R部分的客戶

Page 20: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動

SQL Server 2016 R Services ( In-database)

In-DB analytics

Parallel threading and processing

Easy to operationize

Developers, DBAs and Data Scientists can use their preferred tools

Model on-premises, score in cloud—or vice versa

Easy way to overcome memory limitations -enabling limits of larger data sets

Included in SQL Server 2016

Reuse and optimization of existing R code

Reduced recoding and training costs

$

Page 21: Microsoft SQL Server 2016 R Services...SQL Server Oracle MySQL SAP HANA TPC-H Oracle #2 is #5 SQL Server #1 SQL Server #3 SQL Server SQL Server 2016: Everything built-in 2 從資料到決策和行動