Top Banner
Laboratory for Advanced Collaboration LAC Microsoft Azure Stream Analytics Marcos Roriz and Markus Endler Laboratory for Advanced Collaboration (LAC) Departamento de Informática (DI) Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio)
46

Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

Laboratory for Advanced Collaboration

LAC

Microsoft Azure

Stream Analytics

Marcos Roriz and Markus Endler

Laboratory for Advanced Collaboration (LAC)

Departamento de Informática (DI)

Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio)

Page 2: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

2

Topics

Azure Overview

Stream Analytics Programming Model

Step-by-step example

Page 3: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

3

Microsoft Azure

Microsoft’s cloud computing solution

Several Open Source Components

Windows and Linux VMs

Page 4: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

4

Cloud Computing Overview

Page 5: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

5

Microsoft Azure Cloud Solutions

Page 6: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

6

Microsoft Azure Stream Analytics

Middleware for data stream processing

Offered as a service (Platform as a Service – PAAS)

Provides a SQL-like continuous query language

Developer write “Stream Analytics” Jobs instead of imperative code

Can integrate with visualization frameworks

Page 7: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

7

Microsoft Azure Stream Analytics

Stream Analytics only process data in the cloud.

Thus, how can we send data to Stream Analytics Jobs (cloud)?

Event Hubs (streams), BLOB (static)

Event Hubs is a publish/subscribe middleware.

Bindings for multiple languages.

.NET, Java, Python, etc.

Page 8: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

8

Stream Analytics Architecture

Architectural Overview

Event Hubs

StreamAnalytics

IncomingData

BLOB

Event Hubs

BLOB

Page 9: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

9

Sending Data - Event Hubs

Each Event Hub represent an Event.

Need to be in JSON, CSV, or AVRO format.

Example: Temperature data sent to an event hub.

Event Hubs

IncomingData

String msg = "{\"id\": 123456,\"reading\": 28}"byte[] payloadBytes = msg.getBytes();EventData sendEvent = new EventData(payloadBytes);

EventHubClient ehClient =EventHubClient.createFromConnectionString("eventConnectionKey");

ehClient.sendSync(sendEvent);

Event Hub Middleware (API)

Page 10: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

10

Microsoft Azure Stream Analytics

SQL-like language (similar to StreamInsight)

Input: Event Hubs and/or BLOBs

Output: Event Hubs and/or BLOBS

Example: (Temperature Stream)

SELECT id, readingINTO HighTemperatureStreamFROM TemperatureStreamWHERE reading > 20

Page 11: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

11

Stream Analytics Query Language

Implicit or External Time (timestamp column)

Tumbling, Hopping, Sliding Time Windows

Page 12: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

12

Stream Analytics Query Language

Standard SQL aggregate functions

AVG, SUM, COUNT, MIN, MAX

Example:

Tumbling Window to compute AVG reading per sensor

SELECT id, AVG(reading)INTO AVGTemperatureStreamFROM TemperatureStream

GROUP BY id, TUMBLINGWINDOW(5, s)

Page 13: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

13

Expanding Azure Stream Analytics

Multiple event hub consumers

Each consumer have their own reader (similar to Apache Kafka)

Page 14: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

14

Expanding Azure Stream Analytics

Each Stream Analytics Job implements a single continuous query

Network of Event Hubs and Stream Analytics Jobs

Page 15: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

15

Complete Picture

Page 16: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

16

Example step-by-step

Telecommunications and SIM fraud detection in real-time

Large volume of Call Detail Records (CDR)

Jobs:

Pare this data down to a manageable amount and obtain insights about

customer usage over time and geographical regions.

Detect SIM fraud (multiple calls coming from the same identity around

the same time but in geographically different locations) in real-time

We will use an existing simulator to generate the input data

stream.

Page 17: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

17

Step 1: Create an Event Hub

We will create an event hub to receive the input stream.

In the Azure Portal go to:

Page 18: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

18

Step 2: Create a Consumer Group (Event Hub)

Create a consumer group to consume data from this hub.

In the Azure Portal go to:

Event hub created

Consumer group

Create new consumer group (bottom of the page)

Page 19: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

19

Step 2: Create a Consumer Group (Event Hub)

Create a consumer group to consume data from this hub.

In the Azure Portal go to:

Event hub created

Consumer group

Create new consumer group (bottom of the page)

Page 20: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

20

Step 3: Grant access to consume/send events

Create an access policy

for the Event Hub.

In the Azure Portal go to:

Event hub created

Configure Tab

Create a policy with

management permissions

Save

Page 21: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

21

Step 4: Generate Input Data Stream

Simulator (uses the event bus middleware to send messages)

Other applications need to the use the middleware API to send/receive data.

Data sent:

Page 22: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

22

Step 4: Generate Input Data Stream

Program: Download Link ( )

Need to use Event Hub key to connect with the azure server

Get connection info in event hub panel (at the bottom)

Page 23: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

23

Step 4: Generate Input Data Stream

Edit with this info (remove entity part).

Use connection info and event hub name (CallEventHub)

<?xml version="1.0" encoding="utf-8"?><configuration><appSettings><!-- Service Bus specific app setings for messaging connections --><add key="EventHubName" value="calleventhub"/><add key="Microsoft.ServiceBus.ConnectionString" value="Endpoint=sb://rorizhelloworldhub-

ns.servicebus.windows.net/;SharedAccessKeyName=managepolicy;SharedAccessKey=doggvCnbnq56nwNrdeEGaPsGAOfpTpsZV6mcCmghVqo="/></appSettings>...

Page 24: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

24

Step 4: Generate Input Data Stream

Generate the data stream

telcodatagen.exe [#NumCDRsPerHour]

[SIM Card Fraud Probability] [#DurationHours]

Page 25: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

25

Step 5: Create a Stream Analytics Job

In the Azure Portal go to:

Page 26: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

26

Step 6: Link Event Hub to Stream Analytics Job

Click in the Stream Analytics Jobs

Go to Input tab

Add Input

Page 27: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

27

Step 6: Link Event Hub to Stream Analytics Job

Options:

Data Stream

Event Hub

Config

JSON

Page 28: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

28

Step 7: Get Sample Data

Before we design the query it’s recommended to test the query

with sample data

External input or sample the data stream

Go to Stream Analytics Input Tab

Then choose Sample Data

Page 29: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

29

Step 7: Get Sample Data

Specify the sample data length.

Download the data.

Page 30: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

30

Step 8: Create Continuous Query

Click on the Query tab:

Write down the query.

Use sample data to see the output.

Page 31: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

31

Step 8: Create Continuous Query

Click on test button

Choose the sample data for the test query.

Page 32: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

32

Step 8: Create Continuous Query

The interface now presents the test query output over the

sample data.

Page 33: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

33

Step 9: Refine the Query

Amount of incoming call per region in the last two hour.

Page 34: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

34

Step 9: Refine the Query

Amount of fraudulent calls (different countries, less than 5 seconds):

Page 35: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

35

Step 9: Refine the Query

Save Query

Page 36: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

36

Step 10: Create Output

Remember, one continuous query per Stream Analytics job

Create BLOB Storage. In Azure portal go to:

Page 37: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

37

Step 10: Create Output

Create a Container

Page 38: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

38

Step 11: Link Output to Stream Analytics Job

Go to Output Tab in Stream Analytics Job

Add Output (BLOB, Event Hub, etc)

Page 39: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

39

Step 11: Link Output to Stream Analytics Job

Go to Output Tab in Stream Analytics Job

Add Output (BLOB, Event Hub, etc)

Choose Output Name

(we will use this in the query)

Pick the desired output (event hub/blob)

Page 40: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

40

Go to Query Tab in Stream Analytics Job

Add INTO StreamOutput in the query

Save

Step 12: Change query to refer to Output

Page 41: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

41

Go to Stream Analytics Job Menu

Click Start

Step 13: Start the Stream Analytics Job

Page 42: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

42

Azure Storage Explorer

Step 14: View Output

Page 43: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

43

Pro:

Fast and easy to deploy

SQL-like declarative language

Scale the processing units

Cons:

You cannot dynamically change/alter a Stream Analytics Job

• Complex task due to state transfer, losing events, etc

• However, you can create new Jobs with the new queries

You need several “Event Hubs” to make an event processing network

Conclusion, Pro and Cons

Page 44: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

44

Stream Analytics is priced on two variables:

Volume of data processed

Streaming units required to process the data stream

Pricing

* Streaming unit is a unit of compute capacity with a maximum throughput of 1MB/s

Page 45: Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

45

Daily Azure Stream Analytics cost for 1 MB/sec of average processing

Volume of Data Processed Cost -

$0.0005 /GB * 84.375 GB = $0.08 per day, streaming max 1 MB/s non-stop

Streaming Unit Cost -

$.031 /hr * 24 hrs = $0.74 per day, for 1 MB/sec max. throughput

Total cost -

$0.38 + $0.08 = $0.82 per day -or- ~$24.60 per month

Example Pricing