1
کارگاه پردازش داده توزیع شده
پردیس- شهیدبهشتی
دانشکده علوم و مهندسی کامپیوتر
پایگاه داده توزیع شدهدرس:
دکتر هادی طباطباییاستاد:
ابوالفضل صدیقی ارائه: ۱۳۹۳آبان
Distributed Data Processing
School of Computer Science and Engineering
A. Sedighi
@amirsedighiHexican.com
3
Every Game needs it's Playing Yard
4
Every Game needs it's Playing Yard
5
What can I do on a Single Machine?
● MVC Programming
● Regular Biz Apps
● 100 GBs Data
● Web Surfing
● ...
6
Linux Cluster
7
8
9
Introduction
This is a 4 sessions, hands-on, step-by-step
tutorial on setting up, a Linux cluster on your
machine (Notebook or PC), to try a few number
of big-data processing frameworks and tools.
10
What we are going to do?
● Your notebook, or a PC is just enough for starting.– Setting your Linux cluster up.
● Distributed Log Management and Realtime Search-Engines– What is Elasticsearch?
– Elasticsearch on the cluster.
– Monitoring and Usage.
● The most popular Distributed Data Processing Framework.– What is Apache Hadoop?
– Apache Hadoop on the cluster.
– Using Scenarios.
11
What we would Learn?
● Leveraging our knowledge of Big-Data.
● Getting familiar with distributed data processing.
● Maximizing availability and reliability.
● Increasing data storage capacity.
● Leveraging data processing performance.
● Data locality is a silver bullet.
● Increasing cluster utilization.
● Taming giants by giving them a try.
12
Preparing the Linux Cluster - VirtualBox
13
Preparing the Cluster - Hosting
● VirtualBox
– Memory Size, Disk Capacity and CPU cores.
– Network Interfaces.● NAT, provides Internet.● Host-Only, provides cluster communication.
14
Preparing the Cluster – Adding a Host-Only Network
15
Preparing the Cluster – Adding a NAT Interface
16
Preparing the Cluster – Adding a Host-Only Interface
17
Preparing the Cluster – First Node
● Creating a Linux machine inside VirtualBox.
● Installing Linux. (I've used Ubuntu 12.04)
– Check Samba
– Check OpenSSH
● Give the first node all.
– Having an “install” folder on.
– Having primitives such as Java installed on.
● Shutting down the first node.
18
Preparing the Cluster – Cloning, The Virtual Box Side
● Cloning the first node. (tutorial)
19
Preparing the Cluster – Cloning, the Linux side
● Turning the new node on.
● Network configuration
– sudo nano /etc/hosts
– sudo nano /etc/hostname
– sudo nano /etc/network/interfaces
– sudo rm /etc/udev/rules.d/70-persistent-net.rules
● sudo reboot
20
Preparing the Cluster – No Password Login
● Do this:
– ssh-keygen
– ssh-copy-id -i ~/.ssh/id_rsa.pub user@host
● Or this:
– ssh-keygen -t dsa -p '' -f ~/.ssh/id_dsa
– scp .ssh/id_rsa.pub user@host:~/master_key
– ssh user@host
– cat master_key >> ./ssh/authorized_keys
21
Preparing the Cluster – Distributed Shell
● Do it like a Commander
– Installing DSH (Optional)
22
Preparing the Cluster – Enjoy it
● To scale your cluster just repeat the cloning step.
23
Next?
● An introduction to distributed Log Management and analytical search-engines.– How Elasticsearch works?
– Workshop.
● An introduction to Apache Hadoop
– How Apache Hadoop works?
– Workshop.