Analytics in a day Cloud analytics in the age of self-service and data science
Analytics in a day
Cloud analytics in the age of
self-service and data science
Analytics
Who are
Methods Analytics?
Data Science, AI &
Machine Learning
Digital / Data Portal
Development
Population Health
Management
Business Intelligence as
a Service
Beyond data Deriving actionable intelligence
Measurably improve society
by helping people make
better decisions with data.
Our goal
“
We enable our customers to use data to do good
things and solve difficult problems.
We combine passionate people, sector-specific
insight and technical excellence to provide an end-
to-end data service.
Our approach is collaborative, creative and
user-centric. Our outputs are transparent, robust
and transformative.
Approach
“
Part of the Methods Group of Services
Our accreditations
Cert No. 901
ISO 27001
Information Security
Management
Cert No. 901
ISO 9001
Quality Management
System
Cert No. 901
ISO 14001
Environmental
Management
Session 2: Azure Synapse
AnalyticsLimitless analytics service with unmatched
time to insight
Introducing Azure Synapse Analytics
A limitless analytics service with
unmatched time to insight, that
delivers insights from all your data,
across data warehouses and big data
analytics systems, with blazing speed
Simply put, Azure Synapse is Azure SQL
Data Warehouse evolved
We have taken the same industry leading
data warehouse and elevated it to a whole
new level of performance and capabilities
At the core of all use cases is..Azure Synapse Analytics
Real-time analytics
Modern data warehousing
Advanced analytics
"We want to analyze data coming from multiple sources and in varied formats"
"We want to leverage the analytics platform for advanced fraud detection"
“We’re trying to get
insights from our
devices in real-time”
Cloud-scale analytics
PrepIngest
AzureData Factory
Store
Model & Serve
Azure Databricks Azure Synapse Analytics
Power BI
Azure Data Lake Storage
Cloud data
SaaS data
On-premises data
Devices data
Analytics in Azure Today
Limitless analytics service with unmatched time to insight
Power BI
Cloud data
SaaS data
On-premises
data
Azure Data Lake Storage
SQL
Analytics Runtimes
Azure Synapse Studio
Unified experience
Integration Management Monitoring Security
PREVIEWGA
PREVIEW
Simplify Analytics with Azure Synapse
Azure Machine
Learning
Multiple clusters over
shared data
Online scaling Workload aware
query scheduling
Single Access and Security
for Data Warehouse and
Data Lake workloads
Spark + SQL
integrated runtime
Cluster + Serverless
Innovations in Azure Synapse
PREVIEW PREVIEW GA
PREVIEW
Spark: PREVIEW PREVIEW
SQL: GA
No migration necessary
Seamless journey
Customer Journey
Current “SQL DW”
customers(now Azure Synapse Analytics
customers as of Nov. 4th)
New adopters of
Azure Synapse
Analytics for
enterprise data
warehousing(Starting Nov. 4th and beyond)
Continue to build modern
data warehouse with Azure,
using Azure Synapse for
your enterprise data
warehouse
Automatically enjoy newly
announced Azure Synapse
preview features when they
become generally available
Unmatched SecurityUnified ExperienceLimitless Scale Powerful Insights
Azure Synapse Analytics
Provisioned Data Warehouse
GENERALLY AVAILABLE
On-demand Query as a Service
PREVIEW
Azure Synapse Analytics
Azure Databricks
Azure Data Lake Storage
Business services
Power BI
Azure Data Factory
Best-in-class price
per performance
Price-performance is calculated by GigaOm as the TPC-H metric of cost of ownership divided by composite query. Results based on GigaOm’s TPC-H results, published in January 2019
Leader in price per performance
Results based on GigaOm’s TPC-H results, published in January 2019
$0
$10
$20
$30
$40
$50
$60
$550
$600
$40$33
$47$54
$48$51
$564
Best-in-class price
per performance
Price-performance is calculated by GigaOm as the TPC-H metric of cost of ownership divided by composite query.
$103$110
$152
$80
$100
$120
$140
Results based on GigaOm’s TPC-DS results, published in April y 2019
Best-in-class price
per performance
Price-performance is calculated by GigaOm as the TPC-DS metric of cost of ownership divided by composite query.
Most secure data
warehouse in the cloud
Multiple levels of security between the
user and the data warehouse
...at no additional cost
Threat Protection
Network Security
Authentication
Access Control
Data Protection
Category Feature Synapse
Analytics
Data Protection Data In Transit Yes
Data encryption at rest
(Service & User Managed Keys)
Yes
Data Discovery and Classification Yes
Native Row Level Security Yes
Table and View Security (GRANT / DENY) Yes
Column Level Security Yes
Dynamic Data Masking Yes
SQL Authentication Yes
Native Azure Active Directory Yes
Integrated Security Yes
Multi-Factor Authentication Yes
Virtual Network (VNET) Yes
SQL Firewall (server) Yes
Integration with ExpressRoute Yes
SQL Threat Detection Yes
SQL Auditing Yes
Vulnerability Assessment Yes
Access control for
complete security
Prioritize your
workloads using
workload management
What if you want to
prioritize the workloads that
get access to resources?
1 2 10 11
Running Queued
3 4 5 6 7 98 121011 12
Scheduler without importance
9 10
QueuedQueued
CEOCEOCEO
By default, workloads are run on a first-in first out basis.
With workload importance,
prioritized workloads
take precedence
1 2 10 11
Running Queued
3 4 5 6 7 98 12
Scheduler With Importance Turned On
12
Queued
CEOCEO
LowNormal Normal High
CREATE WORKLOAD CLASSIFIER classifier_name
WITH(
WORKLOAD_GROUP = 'name’ ,MEMBERNAME = 'security_account' [ [ , ] IMPORTANCE = { LOW | BELOW_NORMAL | NORMAL (default) | ABOVE_NORMAL | HIGH }])
Workload isolation takes
management to the
next level
Intra Cluster Workload Isolation(Scale In)
Marketing
Sales
60%
40%
CREATE MATERIALZIED VIEW vw_ProductSalesWITH (DISTRIBUTION = HASH(ProductKey))ASSELECT
ProductNameProductKey,SUM(Amount) AS TotalSales
FROMFactSales fs
INNER JOIN DimProduct dp ON fs.prodkey = dp.prodkeyGROUP BY
ProductName,ProductKey
See more by scaling
to petabytes
ProductName ProductKey TotalSales
Product A 5453 784,943.00
Product B 763 48,723.00
… … …
FactSalesTable
10B Records
DimProductTable
1,000 Records
See more by scaling
to petabytes
FactInventoryTable
mvw_ProductSales1,000 Records
FactSalesDimProduct
SELECTProductNameProductKey,SUM(Amount) AS TotalSales
FROMFactSales fs
INNER JOIN DimProduct dpGROUP BYProductName,ProductKey
FactInventory
Execution 2
Cache Hit
~.2 seconds
Execution 1
Cache MissRegular
Execution
Build confidence in your
data with result set cache
Empower more users per
data warehouse
Use preferred tools for
Azure Synapse Analytics
development
Use preferred tools for
Azure Synapse Analytics
development
In preview (publicly available soon)
Synapse Studio, serverless Query-as-a-Service, big data processing, & more
Azure Synapse StudioA single place for Data Engineers, Data Scientists, and IT Pros to collaborate on enterprise analytics
Azure Synapse Studio
Azure Synapse Studio is divided into Activity hubs.
These organize the tasks needed for building your end-to-end analytics solution.
Overview Data
Monitor Manage
Quick-access to common
gestures, most-recently used
items, and links to tutorials
and documentation
Explore structured and
unstructured data
Centralized view of all resource
usage and activities in the
workspace
Configure the workspace, pools,
access to artifacts, etc.
Develop
Write code and the define
business logic of the pipeline
via notebooks, SQL scripts,
Data flows, etc.
Orchestrate
Design pipelines that move
and transform data
Ingest & transform data from the same service you use for data warehousing, big data processing, and Power BI reports
Quickly explore data with serverless query over the data lake
Azure Synapse Analytics > SQL On Demand
Visualize your serverless query with the click of a button
Azure Synapse Analytics > SQL On Demand
Perform big data processing with embedded Apache Spark
All with your choice of language
• The DVLA are in the process of modernizing and migrating all of their existing systems, comprising of mainframes, bespoke applications and
other technology, to a modern cloud-based platform - a real time
strategic data platform.
• Timely access to data, consistency of reporting and removal of redundant
and siloed data sources were all cited as drivers for this change.
• We were asked to work alongside the DVLA’s existing MIBI team to collaboratively develop functionality through a series of Proof of
Concepts (PoCs) that would enable the DVLA to build a Strategic MIBI
Data Platform.
DVLA MIBI
DVLA MIBI
The conceptual design involved
using components within the
Microsoft Azure Cortana/Synapse
analytics suite including:
• Event Hub to ingest data from a
variety of sources (JSON, .CSV,
RDBMS.. etc)
• Stream analytics to process and
store data in Data Factory and
Azure SQL
• Analysis services for ultimate
display in Power BI.