Top Banner
<Insert Picture Here> Oracle Big Data Appliance and Solutions Jean-Pierre Dijcks Hadoop World – Nov 8 th , 2012
40

Big dataappliance hadoopworld_final

May 12, 2015

Download

Technology

jdijcks

The presentations explains Oracle Big Data Appliance and the software products Oracle announced at its Openworld Conference in 2011.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big dataappliance hadoopworld_final

<Insert Picture Here>

Oracle Big Data Appliance and Solutions

Jean-Pierre DijcksHadoop World – Nov 8th, 2012

Page 2: Big dataappliance hadoopworld_final

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remain at the sole discretion of Oracle.

Page 3: Big dataappliance hadoopworld_final

Case: On-line Ads and Content

NoSQL DB

ExpertSystem

Real-time: Determine best ad to place

on page for this user

Input into

Lookup user profile

Add user if not present

Web logs

HDFS

Profiles

NoSQL DB

High scaledata reductions BI and

AnalyticsBilling

Predictionson browsing

Actual ads

served

Low Latency

Batch

Page 4: Big dataappliance hadoopworld_final

Agenda

• Big Data Technology• Oracle Big Data Appliance• Big Data Applications• Summary• Q&A

Page 5: Big dataappliance hadoopworld_final

<Insert Picture Here>

Big Data Technology

Page 6: Big dataappliance hadoopworld_final

• Deep Analytics

• Agile Development

• Massive Scalability

• Real Time Results• High Throughput

• In-Place Preparation

• All Data Sources/Structures

• Low, predictable Latency

• High Transaction Volume

• Flexible Data Structures

Big Data: Infrastructure Requirements

Acquire Organize Analyze

Page 7: Big dataappliance hadoopworld_final

Divided Solution Spectrum

Acquire AnalyzeOrganize

MapReduceSolutions

DBMS (DW)

DBMS (OLTP)

Advanced Analytics

DistributedFile Systems

Transaction (Key-Value)

Stores

ETL

NoSQL Flexible

SpecializedDeveloper

Centric

SQL TrustedSecure

Administered

DynamicSchema

DataVariety

Schema

Page 8: Big dataappliance hadoopworld_final

8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Oracle Integrated Software Solution Stack

Acquire AnalyzeOrganize

OracleDatabase

(DW)

OracleDatabase

(OLTP)

In-DBAnalytics

“R”Mining

TextGraphSpatial

OracleBI EE

Oracle NoSQL DB

HDFS Hadoop

OracleData Integrator

Oracle Loader for Hadoop

DynamicSchema

DataVariety

Schema

Page 9: Big dataappliance hadoopworld_final

9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Oracle Engineered Solutions

OracleDatabase

(DW)

OracleDatabase

(OLTP)

In-DBAnalytics

“R”Mining

TextGraphSpatial

OracleBI EE

Oracle NoSQL DB

HDFS Hadoop

OracleData Integrator

Oracle Loader for Hadoop

Big Data Appliance• Hadoop• NoSQL Database• Oracle Loader for hadoop• Oracle Data Integrator

Oracle Exadata• OLTP & DW• Data Mining & Oracle R• Semantics• Spatial

Exalytics• Speed of

ThoughtAnalytics

Acquire AnalyzeOrganize

DynamicSchema

DataVariety

Schema

Page 10: Big dataappliance hadoopworld_final

Big Data ApplianceBatch Usage Model

Oracle Big Data Appliance

Oracle Exadata

InfiniBand

Acquire Organize Analyze

Oracle Exalytics

InfiniBand

Page 11: Big dataappliance hadoopworld_final

11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Why build a Hadoop Appliance?

• Time to Build?

• Required Expertise?

• Cost and Difficulty Maintaining?

Page 12: Big dataappliance hadoopworld_final

•18 Sun X4270 M2 Servers

– 48 GB memory per node = 864 GB memory

– 12 Intel cores per node = 216 cores

– 24 TB storage per node = 432 TB storage

•40 Gb p/sec InfiniBand

•10 Gb p/sec Ethernet

Oracle Big Data Appliance Hardware

Page 13: Big dataappliance hadoopworld_final

Big Data ApplianceCluster of industry standard servers for Hadoop and NoSQL Database• Focus on Scalability and Availability at low cost

Compute and Storage• 18 High-performance low-cost

servers acting as Hadoop nodes

• 24 TB Capacity per node• 2 6-core CPUs per node• Hadoop triple replication• NoSQL Database triple

replication

10GigE Network

• 8 10GigE ports• Datacenter connectivity

InfiniBand Network

• Redundant 40Gb/s switches• IB connectivity to Exadata

Page 14: Big dataappliance hadoopworld_final

Scale Out to Infinity

Scale out by connecting racksto each other using Infiniband

• Expand up to eight racks without additional switches

• Scale beyond eight racks by adding an additional switch

Page 15: Big dataappliance hadoopworld_final

•Oracle Linux 5.6

•Java Hotspot VM

•Apache Hadoop Distribution v0.20.x

•R Distribution

•Oracle NoSQL Database Enterprise

Edition

•Oracle Data Integrator Application

Adapter for Hadoop

•Oracle Loader for Hadoop

Oracle Big Data Appliance Software

Page 16: Big dataappliance hadoopworld_final

Why Open-Source Apache Hadoop?

• Fast evolution in critical features• Built by the Hadoop experts in the community• Practical instead of esoteric• Focus on what is needed for large clusters

• Proven at very large scale• In production at all the large consumers of Hadoop• Extremely stable in those environments• Well-understood by practitioners

Page 17: Big dataappliance hadoopworld_final

Software Layout

• Node 1:• M: Name Node, Balancer & HBase Master• S: HDFS Data Node, NoSQL DB Storage Node

• Node 2:• M: Secondary Name Node, Management,

Zookeeper, MySQL Slave• S: HDFS Data Node, NoSQL DB Storage Node

• Node 3:• M: JobTracker, MySQL Master, ODI Agent, Hive

Server• S: HDFS Data Node, NoSQL DB Storage Node

• Node 4 – 18:• S: HDFS Data Nodes, Task Tracker, HBase

Region Server, NoSQL DB Storage Nodes• Your MapReduce runs here!

Page 18: Big dataappliance hadoopworld_final

Big Data ApplianceBig Data for the Enterprise

• Optimized and Complete• Everything you need to store and integrate

your lower information density data

• Integrated with Oracle Exadata• Analyze all your data

• Easy to Deploy• Risk Free, Quick Installation and Setup

• Single Vendor Support• Full Oracle support for the entire system and

software set

Page 19: Big dataappliance hadoopworld_final

<Insert Picture Here>

Oracle NoSQL Database

Page 20: Big dataappliance hadoopworld_final

Key-Value Store Workloads

• Large dynamic schema based data repositories

• Data capture• Web applications• Online retail• Sensor/statistics/network capture/Mobile Devices

• Data services• Scalable authentication• Real-time communication (MMS, SMS, routing)• Personalization / Localization• Social Networks

Page 21: Big dataappliance hadoopworld_final

Oracle NoSQL DB A distributed, scalable key-value database

• Simple Data Model• Key-value pair with major+sub-key paradigm• Read/insert/update/delete operations

• Scalability• Dynamic data partitioning and distribution• Optimized data access via intelligent driver

• High availability• One or more replicas• Disaster recovery through location of replicas• Resilient to partition master failures• No single point of failure

• Transparent load balancing• Reads from master or replicas• Driver is network topology & latency aware

Storage NodesData Center A

Storage NodesData Center B

NoSQLDB Driver

Application

NoSQLDB Driver

Application

Page 22: Big dataappliance hadoopworld_final

• Operation result • New Partition Map• RepNodeStorageTable information

Resolving a Request

Hash Major Key to determine Partition id

Use Partition Map to map Partition id to a Rep Group

Use State Table to determine eligible Storage Node(s) within Rep Group

Use Load Balancer to select best eligible Rep Node

Contact Rep Node directly

ClientOperation + Key[M,m] + Value + Transaction Policy

Page 23: Big dataappliance hadoopworld_final

ACID TransactionsTransaction Policy Write Durability

• Configurable per-operation, application can set defaults

• Write Transaction Durability consists of both

a) Sync policy (on Master and Replica)• Sync – force to disk• Write No Sync – force to OS buffer• No Sync – write to local log buffer,

flush when convenient

b) Replica Acknowledgement Policy• All• Simple Majority• None

Transaction Policy Read Consistency

• Configurable per-operation, application can set defaults

• Read Consistency specified as Absolute, Time-based, Version or None

• Absolute Read from the master• Time-based Read from any

replica that is within <time-interval> of master or better

• Version Read from any replica that is current with <transaction-token> or higher

• None Read from any replica

Page 24: Big dataappliance hadoopworld_final

Oracle NoSQL DB Differentiation

• Commercial Grade Software and Support• General-purpose• Reliable – Based on proven Berkeley DB JE HA• Easy to install and configure

• Scalable throughput, bounded latency

• Simple Programming and Operational Model• Simple Major + Sub key and Value data structure• ACID transactions• Configurable consistency & durability

• Easy Management• Web-based console, API accessible• Manages and Monitors: Topology; Load; Performance; Events; Alerts

• Completes Oracle large scale data storage offerings

Page 25: Big dataappliance hadoopworld_final

Try NoSQL Database on OTN

Oracle NoSQL Database:

• Community Edition is available as a software only distribution

• Enterprise Edition is available as a separately licensable product or as part of Big Data Appliance

Page 26: Big dataappliance hadoopworld_final

<Insert Picture Here>

Oracle Loader for Hadoop

Page 27: Big dataappliance hadoopworld_final

27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Loader for Hadoop Features

• Load data into a partitioned or non-partitioned table– Single level, composite or interval partitioned table– Support for scalar datatypes of Oracle Database– Load into Oracle Database 11g Release 2

• Runs as a Hadoop job and supports standard options

• Pre-partitions and sorts data on Hadoop

• Online and offline load modes

Page 28: Big dataappliance hadoopworld_final

28 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Loader for Hadoop

SHUFFLE/SORT

SHUFFLE/SORT

MAP

MAP

MAP

MAPSHUFFLE

/SORT

REDUCE

REDUCE

SHUFFLE/SORT

SHUFFLE/SORT

REDUCE

REDUCE

REDUCE

INPUT2

INPUT1

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP

Page 29: Big dataappliance hadoopworld_final

29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Loader for Hadoop: Online Option

SHUFFLE/SORT

SHUFFLE/SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP Connect to the database from reducer nodes, load into database partitions in parallel

Read target table metadata from the database

Perform partitioning, sorting, and data conversion

Page 30: Big dataappliance hadoopworld_final

30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Loader for Hadoop: Offline Option

SHUFFLE/SORT

SHUFFLE/SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOPRead target table metadata

from the databasePerform partitioning,

sorting, and data conversion

Write from reducer nodes to Oracle Data Pump files

Import into the database in parallel using external table mechanism

Page 31: Big dataappliance hadoopworld_final

31 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle Loader for Hadoop Advantages

• Offload database server processing to Hadoop:– Convert input data to final database format– Compute table partition for row– Sort rows by primary key within a table partition

• Generate binary datapump files

• Balance partition groups across reducers

Page 32: Big dataappliance hadoopworld_final

32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Input and Output Formats

Input Formats

• Delimited text

• Hive tables– Managed and external tables– Native and non-native tables

• Write your own input format

Output Formats

Online Mode

• Load directly from Hadoop nodes to Oracle database– JDBC– Parallel direct path

Offline Mode

• Datapump format– Create binary files for external tables– Import data into the database from the

external table with a SQL statement

• CSV, delimited text– Load through SQL*Loader or external

table mechanism

Page 33: Big dataappliance hadoopworld_final

33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Selection Output Option for Use Case

Oracle Loader for Hadoop Output Option Use Case Characteristics

Online load with JDBC The simplest use case for non partitioned tables

Online load with Direct Path Fast online load for partitioned tables

Offline load with datapump files Fastest load method for external tables

On Oracle Big Data Appliance

Direct HDFS

Leave data on HDFS

Parallel access from database

Import into database when needed

Page 34: Big dataappliance hadoopworld_final

34 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Invoking Oracle Loader for Hadoop

• Command line

$ hadoop jar oraloader.jar oracle.hadoop.loader.OraLoader

-libjars <library jar files>

-D <configuration properties>

$HADOOP_HOME/bin/hadoop jar oraloader.jar oracle.hadoop.loader.oraLoader -libjars avro-1.4.1.jar, commons-math-2.2.jar

-conf connection.xml -D mapreduce.inputformat.class=oracle.hadoop.loader.lib.input.DelimitedTextInputFormat -D mapreduce.outputformat.class=oracle.hadoop.loader.lib.output.JDBCOutputFormat

Page 35: Big dataappliance hadoopworld_final

36 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Automate Usage of Oracle Loader for Hadoop

• ODI has knowledge modules to – Generate data transformation code to run on Hive/Hadoop– Invoke Oracle Loader for Hadoop

• Use the drag-and-drop interface in ODI to– Include invocation of Oracle Loader for Hadoop in any ODI

packaged flow

Oracle Data Integrator (ODI)

Page 36: Big dataappliance hadoopworld_final

37 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Page 37: Big dataappliance hadoopworld_final

<Insert Picture Here>

Summary

Page 38: Big dataappliance hadoopworld_final

Big Data ApplianceBig Data for the Enterprise

• Optimized and Complete• Everything you need to store and integrate your lower

information density data

• Integrated with Oracle Exadata• Analyze all your data

• Easy to Deploy• Risk Free, Quick Installation and Setup

• Single Vendor Support• Full Oracle support for the entire system and software

set

Page 39: Big dataappliance hadoopworld_final

Big Data Appliance and ExadataBig Data for the Enterprise

NoSQL DB HDFS

Hadoop RDBMS

Page 40: Big dataappliance hadoopworld_final

Questions