Big dataandhp cforawsbrasilsummit

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Big Data and High Performance Computing Solutions in the AWS Cloud

Michel Pereira, Enterprise Solutions Architect

May 27, 2014

Big Data HPC

Customer Success Story

Getting Started on AWS

What we’ll cover today…

Big Data HPC




Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Generation




GB TB PB

95% of the 1.2 ze.abytes of data in the digital universe is unstructured

70% of of this is user-‐generated content

Unstructured data growth explosive, with esDmates of compound annual growth (CAGR) at 62% from 2008 – 2012. Source: IDC

ZB

EB

Big Data: Unconstrained data growth

Lower cost, higher throughput Generation




Customer segmentation

Marketing spend optimization

Financial modeling & forecasting

Ad targeting & real time bidding

Clickstream analysis

Fraud detection

Use Cases

Visits, views, clicks, purchases

Source, device, location, time

Latency, throughput, uptime

Likes, shares, friends, follows

Price, frequency

Metrics

Relational

NoSQL

Web servers

Mobile phones

Tablets

3rd party feeds

Sources

Structured

Unstructured

Text

Binary

Near Real-time

Batched

Formats

Reporting

Dashboards

Sentiment

Clustering

Machine Learning

Optimization

Analysis

Lower cost, higher throughput

Highly constrained

Generation




Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Generated data

Available for analysis

Data volume

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Elastic and highly scalable

No upfront capital expense

Only pay for what you use +

+

Available on-demand +

= Remove constraints

Accelerated

Generation




Technologies and techniques for working productively with data, at any scale.

Big Data

Big data and AWS cloud computing

Big data Cloud computing Variety, volume, and velocity requiring new tools

Variety of compute, storage, and networking options


Big data Cloud computing Potentially massive datasets Massive, virtually unlimited capacity


Big data Cloud computing Iterative, experimental style of data manipulation and analysis

Iterative, experimental style of infrastructure deployment/usage


Big data Cloud computing Frequently not a steady-state workload; peaks and valleys

At its most efficient with highly variable workloads


Big data Cloud computing Absolute performance not as critical as “time to results”; shared resources are a bottleneck

Parallel compute projects allow each workgroup to have more autonomy, get faster results

Ease of use Lower costs

no capital investment

pay as you go

no subscriptions

only pay for what you use


programmable

zero admin easy to configure

integrate with existing tools


One tool to rule them all

Use the right tools

Amazon S3

Amazon Kinesis

Amazon DynamoDB

Amazon Redshift

Amazon Elastic

MapReduce

Store anything

Object storage

Scalable

99.999999999% durability

Amazon S3

Real-time processing

High throughput; elastic

Easy to use

EMR, S3, Redshift, DynamoDB

Integrations

Amazon Kinesis

NoSQL Database

Seamless scalability

Zero admin

Single digit millisecond latency

Amazon DynamoDB

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed

$1,000/TB/Year

Amazon Redshift

Hadoop/HDFS clusters

Hive, Pig, Impala, Hbase

Easy to use; fully managed

On-demand and spot pricing

Tight integration with S3,

DynamoDB, and Kinesis

Amazon Elastic

MapReduce

HDFS

Analytics languages

Data management

Amazon RedShift

Amazon EMR Amazon

RDS

Amazon S3 Amazon DynamoDB

Amazon Kinesis

Sources Sources Data

Sources

AWS Data Pipeline

Generation




Generation




Amazon Glacier

S3

Amazon DynamoDB

Amazon RDS Amazon

Redshift

AWS Direct Connect

AWS Storage Gateway

AWS Import/ Export

Amazon Kinesis Amazon EMR

Generation




Amazon EC2 Amazon EMR Amazon Kinesis

Generation



Collaboration & sharing Amazon

CloudFront AWS

CloudFormation

S3

Amazon DynamoDB

Amazon RDS

Amazon Redshift

Amazon EC2 Amazon EMR

AWS Data Pipeline

The right tools. At the right scale. At the right time.

Big Data HPC





AWS Customer Success Story Victor Oliveira, Diretor de Engenharia Concrete Solutions Marcos Prete, Gerente de Parcerias SAS

The Power to Know

A Empresa - Mundo

•  Líder Mundial em Inteligência Analítica q  Dados para Informações Estratégicas q  Decisões mais rápidas

q  Antecipar oportunidades

•  Fundada em 1976 •  Matriz em Cary, Carolina do Norte •  14 mil funcionários em todo o mundo •  134 países, 400 escritórios •  Great Place to Work

•  1º lugar nos rankings de 2010, 2011 e 2012

The Power to Know

Produtos oferecidos em formato de

licença, mas existe uma demanda latente

de entrega de software como serviço (SaaS)

A Empresa - Brasil

•  Atuação desde 1996 •  + 180 clientes •  Escritórios em SP, RJ e DF •  + 140 colaboradores •  Certificação Top Employers

2012 e 2013

O Desafio do SAS

•  Diminuir os Custo de Operação para seus clientes

The Power to Know

•  Adquirir e Gerenciar Servidores Físicos

•  Simplificar a venda (da licença para SaaS)

•  Oferecer uma Solução Completa

•  Diminuir os Custo de Entrada para seus clientes

•  Big Data •  O produto já existe !

•  Evolução do Negócio •  Value Proposition

•  Alavancar IaaS da AWS •  Parceria com Inteligência

•  Concrete Solutions e SAS

The Power to Know

Abordagem

•  Inédito em SaaS no Brasil. •  Ferramenta beneficia departamentos que

precisam: q  Tomar decisões rápidas baseadas em grande

volume e variedade de dados (Big Data) q  Facilitar a análise dos indicadores de seus

negócios

•  Facilidade e velocidade de entrega, com menor custo em relação ao modelo tradicional.

•  O cliente não precisará gerenciar vários provedores e nem manter uma estrutura interna para suporte ao aplicativo.

The Power to Know

O Produto – Visual Analytics

Dashboards e Scorecards

Relatórios Corpora4vos

Análises Dinâmicas e ad hoc

Análises Avançadas e Data Mining

Mobile Apps, Distribuição informação e Alertas

•  Ad Hoc Analysis •  PredicDve Analysis •  Data Mining

•  Visual ExploraDon •  Slice & Dice InvesDgaDve Analysis •  Root Cause DeterminaDon

•  Page-‐perfect OperaDonal ReporDng •  Pixel-‐perfect Business ReporDng •  Print-‐perfect Statements & Invoices

•  Dynamic Dashboards •  OperaDonal Scorecards •  Metrics Management

•  Mobile ApplicaDons •  Massive InformaDon DistribuDon •  iPad, iPhone, email •  ExcepDon-‐based Alerts

The Power to Know

Introdução ao Visual Analytics

AWS e Benefícios

PARAGRAFO RESUMO

CASO _ KEY WORDS de BENEFICIO,

DESAFIO VENCIDO –

RESUMO DO CASO EM UM PARAGRAFO

•  Flexibilidade de Capacidade

•  Planejamento do Fluxo de Caixa

•  Escalabilidade e Agilidade com baixo custo

•  Flexibilidade no pagamento

•  Menos funcionários para gerenciar a aplicação

•  Melhora no fluxo de caixa

The Power to Know

Serviços

Software

•  Instalação •  Suporte •  Treinamento •  Carga de Dados

•  SAS Visual Analytics

Infraestrutura Gerenciada

Solu

ção

Com

plet

a

•  AWS e Concrete

The Power to Know

BI Tradicional vs. Ambiente de Exploração de Dados

The Power to Know

Obrigado!

Mais informações: estamos no estande da Concrete!

Marcos Prete Gerente de Alianças do SAS Brasil [email protected]

Victor Oliveira Diretor de Engenharia [email protected] @v_oliv

Big Data HPC




Take a typical big computation task…

…that an average cluster is too small (or simply takes too long to complete)…

…optimization of algorithms can give some leverage…

…and complete the task in hand…

Applying a large cluster…

…can sometimes be overkill and too expensive

AWS instance clusters can be balanced to the job in hand…

…nor too large…

…nor too small…

…with multiple clusters running at the same time

Why AWS for HPC?

Low cost with flexible pricing Efficient clusters

Unlimited infrastructure

Faster time to results

Concurrent Clusters on-demand

Increased collaboration

Cluster compute instances Implement HVM process execution Intel® Xeon® processors 10 Gigabit Ethernet –C3 has Enhanced Networking, SR-IOV

cc2.8xlarge

32 vCPUs 2.6 GHz Intel Xeon E5-2670 Sandy Bridge 60.5 GB RAM

4 x 840 GB Local HDD

c3.8xlarge

32 vCPUs 2.8 GHz Intel Xeon E5-2680v2 Ivy Bridge

60GB RAM

2 x 320 GB Local SSD

AWS High Performance Computing

c3.8xlarge


60GB RAM


Top 500 Super Computer using Amazon EC2

64th fastest supercomputer, Nov 2013 26,496 Intel® Xeon® cores Linpack Performance (Rmax) 484.2 TFlop/s Theoretical (Rpeak) 593.5 Tflops/s

c3.8xlarge


60GB RAM


c3.8xlarge


60GB RAM


c3.8xlarge


60GB RAM


Network placement groups Cluster instances deployed in a Placement Group enjoy low latency, full bisection 10 Gbps bandwidth

10Gbps AWS High Performance Computing

GPU compute instances

cg1.4xlarge

Intel® Xeon® X5570 33.5 vCPUs

22.5GB RAM 2x NVIDIA GPU 448 Cores 3GB Mem

g2.2xlarge

Intel® Xeon E5-2670 8vCPUs

15GB RAM 1x NVIDIA GPU 1536 Cores 4GB Mem

G2 instances 1 NVIDIA Kepler GK104 GPU I/O Performance: Very High (10 Gigabit Ethernet)

CG1 instances 2 x NVIDIA Tesla “Fermi” M2050 GPUs I/O Performance: Very High (10 Gigabit Ethernet)

AWS High Performance Computing

HPC Partners and Apps

Making Production Cloud HPC easy from 64 cores to …

Pharma Johnson & Johnson

Manufacturing HGST, a Western Digital Company

Financial Services Pacific Life Insurance

Genomics Life Technologies

Research The Aerospace

Corporation

… 156,314 cores for better solar panel materials for $33k, not $68M

Amazon EC2 16,788 Spot

Instances

Amazon S3 4TB

Processed

Spot Instances on all 8 Regions

1.21 PetaFLOPS

Intel SandyBridge on CC2

Big Data HPC





AWS Customer Success Story Sergio Mafra, Líder de Inovação em TI ONS – Operador Nacional do Sistema Elétrico

•  O Operador Nacional do Sistema Elétrico (ONS) é uma empresa privada, responsável pelo planejamento e operação da geração e transmissão de energia elétrica no Sistema Interligado Nacional (SIN).

•  Com cerca de 800 funcionários, em 5 local idades (Rio de Janeiro, Recife, Florianópolis e Brasília), o ONS é uma empresa intensiva em informações com uso contínuo de modelos matemáticos que requer HPC (High Performance Computing e Big Data)

“A Amazon Web Services permite provisionar clusters de alto desempenho em minutos, reduzindo significantemente o tempo total de processamento”.

“Com isso, percebemos que a AWS transforma High Performance

Computers em High Performance Customers”

- Sérgio Mafra

O SIN atende 98% do consumo de eletricidade

do Brasil.

SIN - Sistema Elétrico Brasileiro

Sistemas Isolados Amazônia Legal 2% do Mercado Predominantemente Térmico + 300 localidades isoladas -

Modelo predominantemente hidroelétrico com grandes

reservatórios e grandes interligações.

O Desafio

•  Prover ao ONS uma plataforma de maior capacidade de processamento, permitindo obter uma redução no tempo de solução dos modelos matemáticos, com custo adequado ao tempo de utilização, de fácil gestão do ambiente em cluster e que fosse transparente para a organização.

•  Permitir o “time-to-market” para a área de TI , de tendo o conhec imento e a responsividade às demandas inesperadas provenientes das áreas da organização.

“Scotty, We Need More Power”

Benefícios alcançados

•  Redução de cerca de 40% no tempo de resolução dos modelos matemáticos de planejamento eletro-energéticos, com custo 30% inferior.

•  Condição de analisar 5 estratégias de utilização dos modelos Newave/Decomp em prazo recorde (1 semana), com a execução de 600 casos. O prazo on-premises seria de 3 semanas, incompatível com o compromisso acordado com o MME.

Virtual Private Cloud

Work

Controlador

Internet/AWS

10.24.0.0/24 10.24.1.0/24

10.21.0.0/16

Benefícios alcançados

•  “Uau... 40 minutos para 4 minutos !!!!” •  “Agora vou usar todos os parâmetros de

cálculo para ter um estudo mais completo” •  “Salta 4 x 80 para agora !!!” •  “Obrigado por poder sair 2 horas mais

cedo. Todos os casos já rodaram” •  “Rodamos o estudo em 2 minutos. O

sistema pode ser operacional e vai virar caso internacional de sucesso”

Sistema de Medição Sincronizada de Fasores - SMSF

PDC

Armazenamento Anual do SMSF

2013 •  8,5 TB

2015 •  70 TB

2018 •  120 TB

2022 •  312 TB

Big Data

Data

Coleta estimada para apenas 7 grandezas de medida

Volume total do Storage do DC do Rio em 2013

Histórico

1 Tb

Cluster Hadoop

OpenPDC

Coletor

Master

Nó 1

Nó 3

Nó N

Nó 2

HDFS

HDFS

HDFS

HDFS

S3

Armazenador

Glacier

Historiador

Glacier

Glacier Glacier

Glacier

Analytics

PMUs

Controlador

Processamento

Arquitetura

EM ESTUDO

Big Data HPC




Solution Architects

Professional Services

Premium Support

AWS Partner Network (APN)

AWS is here to help

AWS Architecture Diagrams

https://aws.amazon.com/architecture/

Processing large amounts of parallel data using a scalable cluster

Use commonly-available cluster scheduling tools, such as Grid Engine or Condor

AWS Online Software Store

http://aws.amazon.com/marketplace

Big Data Case Studies

Learn from other AWS customers

https://aws.amazon.com/solutions/case-studies/big-data


https://aws.amazon.com/marketplace

AWS Marketplace


http://aws.amazon.com/marketplace

AWS Public Data Sets

Free access to big data sets

https://aws.amazon.com/publicdatasets


AWS Big Data Test Drives

APN Partner-provided labs

https://aws.amazon.com/testdrive/bigdata

Webinars, Bootcamps, and Self-Paced Labs https://aws.amazon.com/training

AWS Training & Events

https://aws.amazon.com/events


Big Data to AWS

Brand new course on Big Data

https://aws.amazon.com/training/course-descriptions/bigdata/

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

https://aws.amazon.com/big-data https://aws.amazon.com/hpc

Big dataandhp cforawsbrasilsummit

Business

data available

durability amazon s3

tbyear amazon redshift

relational data warehouse

unconstrained data growth

idc zb eb big data

analysis data volume

aws cloud michel pereira