Top Banner
35

- · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Mar 08, 2018

Download

Documents

ngothien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:
Page 2:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

<Insert Picture Here>

Connecting Hadoop with Oracle Database

Sharon Stephen Senior Curriculum Developer

Server Technologies Curriculum

Page 3:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

The following is intended to outline our general product direction.

It is intended for information purposes only, and may not be

incorporated into any contract. It is not a commitment to deliver

any material, code, or functionality, and should not be relied upon

in making purchasing decisions. The development, release, and

timing of any features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

Page 4:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Agenda

Introduction to Oracle Big Data

Phases of processing Oracle Big Data

Overview of the Oracle Big Data Connectors

Methods of connecting Hadoop to an Oracle Database

Summary

Page 5:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

What Is Big Data?

Big data is defined as voluminous unstructured data from

many different sources…….

Social networks

Banking and financial services

E-commerce services

Web-centric services

Internet search indexes

Scientific searches

Document searches

Medical records

Weblogs

Page 6:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Big Data: “Big Goals”

Discover Intelligent data

Understand ecommerce behavior

Derive sentiment

Support interactions

Page 7:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Why Oracle For Big Data?

Page 8:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle Big Data: Four Phased Solution

Acquire Organize Analyze Decide

1 2 3 4

Page 9:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle’s Big Data Solution

Stream Acquire – Organize – Analyze

Oracle BI Foundation Suite

Oracle Real-Time Decisions

Endeca Information Discovery

Decide

Oracle Event Processing

Oracle Big Data Connectors

Oracle Data Integrator

Oracle

Advanced

Analytics

Oracle

Database

Oracle Spatial

& Graph

Apache Flume

Oracle GoldenGate

Oracle NoSQL

Database

Cloudera

Hadoop

Oracle R

Distribution

Page 10:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Apache Hadoop

Apache Hadoop is a framework for executing applications on large

clusters built of commodity hardware. It operates by batch processing.

• Hadoop consists of two components:

• Hadoop Distributed File System

• MapReduce

Page 11:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle Big Data Connectors

• Facilitate data access to data stored in a Hadoop

cluster.

• Can be licensed for use on either Oracle Big Data

Appliance or a Hadoop cluster running on commodity

hardware.

Acquire – Organize – Analyze

Oracle Big Data Connectors

Oracle Data Integrator

Oracle

Advanced

Analytics

Oracle

Database

Oracle Spatial

& Graph

Oracle NoSQL

Database

Cloudera

Hadoop

Oracle R

Distribution

Page 12:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Usage Scenarios

• Bulk loading of large volumes of data

Example: Historical data; daily uploads of data gathered during

the day

• Loading at regular frequency

Example: 24/7 monitoring of log feeds

• Loading at irregular frequency

Example: Monitoring of sensor feeds

• Accessing data files in place on HDFS

Page 13:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Installation Details

• You can download connectors from either of the

following locations:

• Oracle Technology Network

http://www.oracle.com/technetwork/bdc/big-data-

connectors/downloads/index.html

• Oracle Software Delivery Cloud

https://edelivery.oracle.com/

Page 14:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle Big Data Connectors Licensed together

• Oracle Loader for Hadoop

• Oracle SQL Connector for HDFS

• Oracle Data Integrator Application Adapter for

Hadoop

• Oracle R Connector for Hadoop

• Oracle Xquery for Hadoop

Page 15:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle Loader for Hadoop

• Efficient and high-performance loader for fast movement

of data from any Hadoop cluster into a table in Oracle

Database

• Allows you to use Hadoop MapReduce processing

• Partitions the data and transforms it into an Oracle-

ready format

• Can be operated in two modes

• Online mode

• Offline mode

Page 16:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Data Samples

{"custId":1046915,"movieId":null,"genreId":null,"time":"2012-07-01:00:33:18","recommended":null,"activity":9}

{"custId":1144051,"movieId":768,"genreId":9,"time":"2012-07-01:00:33:39","recommended":"N","activity":6}

{"custId":1264225,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:01","recommended":null,"activity":8}

{"custId":1085645,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:18","recommended":null,"activity":8}

{"custId":1098368,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:28","recommended":null,"activity":8}

{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:35:09","recommended":"Y","activity":11,"price":3.99}

{"custId":1156900,"movieId":20352,"genreId":14,"time":"2012-07-01:00:35:12","recommended":"N","activity":7}

{"custId":1336404,"movieId":null,"genreId":null,"time":"2012-07-01:00:35:27","recommended":null,"activity":9}

{"custId":1022288,"movieId":null,"genreId":null,"time":"2012-07-01:00:35:38","recommended":null,"activity":8}

{"custId":1129727,"movieId":1105903,"genreId":11,"time":"2012-07-01:00:36:08","recommended":"N","activity":1,"rating":3}

{"custId":1305981,"movieId":null,"genreId":null,"time":"2012-07-01:00:36:27","recommended":null,"activity":8}

Apache Weblogs

JSON files

Twitter feeds

Sensor data

Machine logs

Page 17:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Online and Offline Modes

SHUFFLE

/SORT

SHUFFLE

/SORT

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

ORACLE LOADER FOR HADOOP

Connect to the database from reducer nodes, load into database partitions in parallel

Partition, sort, and convert into Oracle data types on Hadoop

Page 18:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Choosing Output Mode in OLH

OUTPUT MODE USE CASE CHARACTERISTICS

Online load with JDBC Simplest, can load into for

nonpartitioned tables

Online load with Direct Path Fast online load for partitioned

tables

Offline load with Data Pump

files

Fastest load method for external

tables

Page 19:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

OLH: Advantages

• OLH offloads database server processing to Hadoop by:

• Converting the input data to final database format

• Computing the table partition for a row

• Sorting rows by primary key within a table partition

• Generating binary Data Pump files

• Balancing partition groups across reducers

Page 20:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle SQL Connector for HDFS

• Oracle SQL Connector for HDFS (OSCH) is a connector

that facilitates read access from HDFS to Oracle

Database using external tables.

• It uses the ORACLE_LOADER access driver.

• It enables you to:

• Access big data without loading the data

• Access the data stored in HDFS files

• Access CSV (comma-separated value) files and Data Pump

files generated by Oracle Loader for Hadoop

• Load data extracted and transformed by Oracle Data

Integrator

Page 21:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle Database SQL

Query

HDFS

Oracle SQL Connector for HDFS

External

Table

Infini Band

HDFS

Client OSCH

Direct access from Oracle Database SQL access to HDFS External table view Data query or import

Page 22:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

OSCH: Three Simple Steps

1.Create an external table.

2.Run the OSCH utility to publish HDFS content to the external

table.

3.Access and load into the database using SQL.

>hadoop jar \

$ODCH_HOME/jlib/orahdfs.jar \

oracle.hadoop.hdfs.extab.ExternalTable\

-conf MyConf.xml \

-publish

Page 23:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

OSCH: Features

• Access and analyze data in place on HDFS via external tables.

• Query and join data on HDFS with database-resident data.

• Load into the database using SQL (if required).

• Automatic load balancing to maximize performance.

• DML operations and indexes cannot be created on external

tables.

• Data files can be text files or Oracle Data Pump files.

• Parallelism is controlled by the external table definition.

• Data files are grouped to distribute load evenly across PQ

slaves.

Page 24:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Choosing Connectors

Oracle Loader for Hadoop Oracle SQL Connector for HDFS

Load into Oracle Database. Access directly from Hive, HDFS, or

load into Oracle Database.

Input data can be delimited text,

data from Hive tables (Oracle-

supported Input Formats), or any

other format, for example, binary

format (by creating your own input

format).

Input data can be delimited text or

Oracle Data Pump files only.

Page 25:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

ODI Application Adapter for Hadoop

• Is a Big Data Connector that allows data integration developers

to easily integrate and transform data within Hadoop using

Oracle Data Integrator

• Has preconfigured ODI knowledge modules

Oracle Loader for

Hadoop

Oracle Data

Integrator

Page 26:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle R Connector for Hadoop

• Oracle R Connector for Hadoop (ORCH) is an R package that

provides an interface between the local R environment, Oracle

Database, and Cloudera CDH.

• Using simple R functions, you can:

• Sample data in HDFS

• Copy data between Oracle Database and HDFS

• Schedule R programs to execute as MapReduce jobs

• Return the results to Oracle Database or to your laptop

Oracle R

Enterprise (ORE)

Cloudera CDH

Oracle R Connector for

Hadoop (ORCH)

Page 27:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Word Count: Example Without ORCH import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import

org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reporter;

public class WordMapper extends MapReduceBase

implements Mapper<LongWritable, Text, Text,

IntWritable> {

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable>

output, Reporter reporter)

throws IOException {

String s = value.toString();

for (String word : s.split("\\W+")) {

if (word.length() > 0) {

output.collect(new Text(word), new

IntWritable(1));

}

}

}

}

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import

org.apache.hadoop.mapred.OutputCollector;

import

org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

public class SumReducer extends MapReduceBase

implements

Reducer<Text, IntWritable, Text,

IntWritable> {

public void reduce(Text key,

Iterator<IntWritable> values,

OutputCollector<Text, IntWritable>

output, Reporter reporter)

throws IOException {

int wordCount = 0;

while (values.hasNext()) {

IntWritable value = values.next();

wordCount += value.get();

}

output.collect(key, new

IntWritable(wordCount));

}

}

Mapper Reducer

Page 28:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Word Count: Example with ORCH

input <- hdfs.put(corpus)

wordcount <- function (input, output = NULL, pattern = " ") {

res <- hadoop.exec(dfs.id = input,

mapper = function(k,v) {

lapply( strsplit(x = v, split = pattern)[[1]],

function(w) orch.keyval(w,1)[[1]])

},

reducer = function(k,vv) {

orch.keyval(k, sum(unlist(vv)))

},

config = new("mapred.config",

job.name = "wordcount",

map.output = data.frame(key=0, val=''),

reduce.output = data.frame(key='', val=0) )

)

res

}

Page 29:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Oracle

Database

Acquire – Organize – Analyze

Oracle Big Data Connectors

Oracle Data Integrator

Oracle Loader

for Hadoop

OXH is a transformation engine for Big Data

XQuery language executed on the Map/Reduce framework

XQuery

for $ln in

text:collection()

let $f :=

tokenize($ln)

where $f[1] = 'x'

return

text:put($f[2])

Map/Reduce

Execution Plan

M/R

M/R

M/R

M/R

Map/Reduce

Worker Nodes

HDFS

OXH

Engine

NEW: Oracle XQuery for Hadoop

Page 30:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Performance

• 15 TB / hour

• 25 times faster than third party

products

• Reduced database CPU usage in

comparison

Page 31:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Connectors: Engineered to Leverage

All Data

ODI Application

Adapter for

Hadoop

Oracle Loader for

Hadoop

Oracle Direct

Connector for

HDFS

Oracle R

Connector for

Hadoop

Oracle Xquery for

Hadoop

Page 32:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Summary

• Oracle Big Data Connectors are products for high speed

loading from Hadoop to Oracle Database

Cover a range of use cases

Several input sources

Flexible, easy-to-use, developed and supported by Oracle

• The fastest load option loads at 15 TB/hour

Page 34:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can:

Questions

Mail your queries to: [email protected]

Page 35:  -  · PDF file Connecting Hadoop with Oracle Database Sharon Stephen ... Oracle Database, and Cloudera CDH. •Using simple R functions, you can: