CREATING HBASE CLUSTER AND REPLICATION ON AWS1 Setting up Amazon
EC2 InstancesCreating two clusters on same regions with 3 node on
one cluster and 3 nodes on other Clusters with minimum volume of
8GB.1.1 Launch InstanceLogin to Amazon Web Services, click on My
Account and navigate to Amazon EC2 Console
1.2 Select AMISelect the Ubuntu-precise-12.04 Server 64 bit
OS
1.3 Select Instance TypeSelect the `Instance Type` as
`m3.medium.
1.4 Configure Number of InstancesProvide the instance details
,shutdown behavior and availability zone.
1.5 Add StorageUse the default options in the below screen.
1.6 Instance DescriptionProvide instance name and
description
1.7 Define a Security GroupIt is very important to configure the
EC2 firewall correctly. On the Configure Firewall page choose
Create a new Security Group, and authorize all the ports listed
below:
1.8 Review and Launch Instance.Check the instance details and
click launch
1.9 Launch Instance and Create Security PairAmazon EC2 uses
publickey cryptography to encrypt and decrypt login information.
Publickey cryptography uses a public key to encrypt a piece of
data, such as a password, then the recipient uses the private key
to decrypt the data. The public and private keys are known as akey
pair.1.10 Define a Security GroupCreate a new security group, and
modify the security group with security rules.
1.11 Launching InstancesOnce you click Launch Instance 6
instance should be launched with pending state
Once in running state rename the instance name as below.NameNode
Standby1Standby2MasterSlave1Slave2
2 Setting up client access to Amazon InstancesCreate a new
keypair and give it a name Clusterkey and download the keypair
(.pem) file to your local machine. Click Launch Instance
2.1 Generating Private KeyLets launch PUTTYGEN client and import
the key pair which is already created during launch instance step
Clusterkey.pem Navigate to Conversions and Import Key
Click Generate ,
Save Private KeyNow save the private key by clicking on Save
Private Key and click Yes and leave passphrase empty. 2.2 Connect
to Amazon InstanceLaunch Putty client and Load the ppk file.Repeat
this for slave nodes.2.3 Setup WinSCP access to EC2 instances:
In order to securely transfer files from your windows machine to
Amazon EC2 WinSCP is a handy utility.For User name, enter the
default user name for your AMI. For Amazon Ubuntu AMIs, the user
name is UbuntuFor Private key, enter the path to your private key,
or click the "" button to browse for the file.Click Login to
connect, and click Yes to add the host fingerprint to the host
cache.
Select the pem file clusterkey.pem file and drag it to other
right pane.
Repeat this for slave nodes.
3 Setup Password-less SSH on Servers
Master server remotely starts services on salve nodes,
whichrequires password-less access to Slave Servers. AWS Ubuntu
server comes with pre-installed OpenSSh server.The public part of
the key loaded into the agent must be put on the target system in
~/.ssh/authorized_keys. This has been taken care of by the AWS
Server creation processNow we need to add the AWS EC2 Key Pair
identity Clusterkey.pem to ssh profile In order to do that we will
need to use following ssh utilities ssh-agent is a background
program that handles passwords for SSH private keys. ssh-add
command prompts the user for a private key password and adds it to
the list maintained by ssh-agent. Once you add a password to
ssh-agent, you will not be asked to provide the key when using SSH
or SCP to connect to hosts with your public key.Amazon EC2 Instance
has already taken care of authorized_keys on master server, execute
following commands to allow password-less SSH access to slave
servers.
Steps: In a command line shell, change directories to the
location of the private key file that you created when you launched
the instance. Use the chmod command to make sure your private key
file isn't publicly viewable. For example, if the name of your
private key file is my-key-pair.pem, you would use the following
command: chmod 400 Clusterkey.pem
Use the ssh command to connect to the instance. You'll specify
the private key (.pem) file and username@public_dns_name. For
Amazon Ubuntu, the default user name is ubuntu. For RHEL5, the user
name is often root but might be ec2-user. For Ubuntu, the user name
is ubuntu. For SUSE Linux, the user name is root. Otherwise, check
with your AMI provider.
ssh -i Clusterkey.pem
[email protected]
You'll see a response like the following.The authenticity of
host 'ec2-198-51-100-1.compute-1.amazonaws.com
(10.254.142.33)'can't be established.RSA key fingerprint is
1f:51:ae:28:bf:89:e9:d8:1f:25:5d:37:2d:7d:b8:ca:9f:f5:f1:6f.Are you
sure you want to continue connecting (yes/no)?
(Optional) If you've launched a public AMI, verify that the
fingerprint in the security alert matches the fingerprint that you
obtained in step 1. If these fingerprints don't match, someone
might be attempting a "man-in-the-middle" attack. If they match,
continue to the next step Enter yes.You'll see a response like the
following.Warning: Permanently added
'ec2-54-241-10-95.compute-1.amazonaws.com' (RSA) to the list of
known hosts.
Sample screenshot for the password-less ssh,
4 Download the Cloudera Manager 4.5 installer and execute it on
the remote instance:$ wget
http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin$
chmod +x cloudera-manager-installer.bin$ sudo
./cloudera-manager-installer.bin
Click Yes,
Note down the http://localhost:7180/ this is used to open the
Cloudera Manager Console using browser.
4.2 Installing a CDH Cluster with Cloud Express WizardAfter
logging in, Cloudera Manager will detect that it runs on EC2, and
it will greet you with the welcome screen of the new wizard (see
below). There is a warning that the instances started by this
installer are instance store-based, which implies that stopping or
terminating these instances results in losing all data stored on
them. Remember to back-up important data from the cluster before
terminating the instances!Default username:adminDefault
password:admin
Select Cloudera Enterprise Trial and click next,
Click Launch the classic wizard,
Click continue,
Enter the internal ips of each node on the clusters
Select the package,versoin and release ,
Login as Ubuntu user and click browse to upload the .pem file
and click continue
Installation Progress Starts here,
If No issues with configurations installation will complete
successfully.
Click Continue,
Choose the CDH services whichever required, and click inspect
Assignments,
Assign appropriate services and its roles to the required
hosts
Click test connection,
Click continue,
Cluster services starts here,
Check the health status and configuration issues it should shows
good health
The Java Heap size recommended minimum size is 1G
HBase Replication:
Step1:Enable the replication In the Cloudera Manager as
below
Restart the HBase
Step2:Add the following code to HBase's configuration file
(hbase-site.xml) to enablereplication on the master
cluster:hadoop@master1$ vi $HBASE_HOME/conf/hbase-site.xml
hbase.replicationtrue
Sync the change to all the servers, including the client nodes
in the cluster, andrestart HBase.Repeat this to slave
node.Step3:hbase(main):010:0> create 'emp', { NAME =>
'Details', REPLICATION_SCOPE =>1}0 row(s) in 1.1070 seconds=>
Hbase::Table - emphbase(main):011:0> disable 'emp'0 row(s) in
1.2170 secondsIf you are using an existing table, alter it to
support replication:hbase(main):012:0> alter 'emp', NAME =>
'cf1', REPLICATION_SCOPE => '1'Updating all regions with the new
schema...1/1 regions updated.Done.0 row(s) in 1.5200 seconds
hbase(main):013:0> enable 'emp'0 row(s) in 1.1860
secondsExecute steps 2 to 3 on the peer (slave) cluster as well.
This includes enablingreplication, restarting HBase, and creating
an identical copy of the table.Step4:hbase(main):014:0>
start_replication0 row(s) in 0.1210 secondshbase(main):016:0>
put 'emp', 'row1', 'Details:name','devaraj'0 row(s) in 0.0180
secondshbase(main):017:0>put 'emp','row1','Details:Eid','1009'0
row(s) in 0.0130 seconds
hbase(main):019:0>put
'emp','row1','Details:mobile','90000101011'0 row(s) in 0.0140
secondshbase(main):021:0> put
'emp','row1','Details:Year','2013'0 row(s) in 0.0110
secondshbase(main):022:0> put
'emp','row2','Details:Name','Prabu'Step5:To check peer is enabled
or not:hbase(main):001:0> list_peers PEER_ID CLUSTER_KEY STATE 1
ip-10-202-169-141.us-west-1.compute.internal:2181:/hbase ENABLED2
ip-10-190-147-97.us-west-1.compute.internal:2181:/hbase ENABLED 3
ip-10-249-0-249.us-west-1.compute.internal:2181:/hbase ENABLED
hbase(main):002:0> add_peer '2',
'ip-10-190-147-97.us-west-1.compute.internal:2181:/hbase'0 row(s)
in 0.0290 seconds
hbase(main):003:0> add_peer '3',
'ip-10-249-0-249.us-west-1.compute.internal:2181:/hbase'0 row(s) in
0.0700 seconds.
Step6:Connect to HBase Shell on the peer cluster and do a scan
on the table to see if thedata has been replicated:
$HBASE_HOME/bin/hbase shell
hbase> scan ' emp'ROW COLUMN+CELL row1 column=Details:name,
timestamp=1401702464224, value=Devaraj row1 column=Details:Eid,
timestamp=1401703326645, value=1010
HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.92.1.jar
verifyrep 1 empStep6:Stop the replication on the master cluster by
running the following command:
hbase> stop_replication
Step7:Remove the replication peer from the master cluster by
using the following command:
hbase> remove_peer '1'