Top Banner
Apache Sqoop 陳陳陳
8

Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

Dec 27, 2015

Download

Documents

Andrea Anderson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

Apache Sqoop

陳威宇

Page 2: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

Sqoop : RDB 與 Hadoop 的橋樑

• Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores.

• 從 .. 拿資料– RDBMS– Data warehources– NoSQL

• 寫資料到 ..– Hive– Hbase

• 使用 mapreduce framework to transfer data in parallel

2figure Source : http://bigdataanalyticsnews.com/data-transfer-mysql-cassandra-using-sqoop/

Page 3: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

Sqoop 使用方法

3figure Source : http://hive.3du.me/slide.html

Page 4: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

Sqoop 與大象的連結 ( setup )

• 解壓縮 http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.5-cdh5.3.2.tar.gz

• 修改~/.bashrc

• 修改 conf/sqoop-env.sh

• 啟動 sqoop

export JAVA_HOME=/usr/lib/jvm/java-7-oracleexport HADOOP_HOME=/home/hadoop/hadoopexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport HIVE_HOME=/home/hadoop/hiveexport SQOOP_HOME=/home/hadoop/sqoopexport HCAT_HOME=${HIVE_HOME}/hcatalog/ export PATH=$PATH:$SQOOP_HOME/bin:

$ sqoopTry 'sqoop help' for usage.

export HADOOP_COMMON_HOME=/home/hadoop/hadoopexport HBASE_HOME=/home/hadoop/hbaseexport HIVE_HOME=/home/hadoop/hive

Page 5: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

練習一 : 實作 import to hive

cd ~ git clone https://github.com/waue0920/hadoop_example.git cd hadoop_example/sqoop/ex1 mysql -u root -phadoop < ./exc1.sql hadoop fs -rmr /user/hadoop/authors sqoop import --connect jdbc:mysql://localhost/books

--username root --table authors --password hadoop --hive-import -m 1

練習 : 用 hive 語法查詢是否已經匯入hive> select * from authors;

Page 6: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

練習一 : 製作 job

hadoop fs -rmr /user/hadoop/authors sqoop job --create myjob1 -- import --connect

jdbc:mysql://localhost/books --username root -table authors -P -hive-import -m 1

sqoop job --list sqoop job --show myjob sqoop job --exec myjob

練習 : 用 hive 語法查詢是否已經匯入hive> select * from authors;

Page 7: Apache Sqoop 陳威宇. Sqoop : RDB 與 Hadoop 的橋樑 Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores. 從.. 拿資料 –RDBMS.

練習二 : 實作 export to mysql

cd ~/hadoop_example/sqoop/ex2 mysql -u root -phadoop < ./create.sql ./update_hdfs_data.sh sqoop export --connect jdbc:mysql://localhost/db

--username root --password hadoop --table employee --export-dir /user/hadoop/sqoop_input/emp_data