As in our previous "Creating a Hadoop Cluster" post, we'll need to use one of the following, 64-bit operating systems:
I'm going to use RHEL 6.6 for this.
Let's press on with the environment set up. For this, I will have one namenode (hadoop1), two secondary servers (hadoop2 and hadoop3) and 5 datanodes (hadoop4, hadoop5, hadoop6, hadoop7 and hadoop8).
Unfortunately, the menial tasks that involve system configuration cannot be avoided, so let's press on:
First of all, let's update everything (This should be issued on every server in our cluster):
We'll need this (This should be issued on every server in our cluster.):
Note: It's always best to actually have your node FQDNs on your DNS server and skip the next two steps (editing the /etc/hosts and the /etc/host.conf files).
Now, let's edit our /etc/hosts to reflect our cluster (This should be issued on every server in our cluster):
We should also check our /etc/host.conf and our /etc/nsswitch.conf, unless we want to have resolvable hostnames:
We'll need a large number of file descriptors (This should be issued on every server in our cluster):
We should make sure that our network interface comes up automatically:
And of course make sure our other networking functions, such as our hostname are correct:
We'll need to log in using SSH as root, so for the time being let's allow root logins. We might want to turn that off after we're done, as this is as insecure as they come:
NTP should be installed on every server in our cluster. Now that we've edited our hosts file things are much easier though:
Set up passwordless ssh authentication (note that this will be configured automatically during the actual installation so this in not necessary; it is useful though, since it saves us from a lot of typing):
Time to tone down our security a bit so that our cluster runs without problems. My PC's IP is 192.168.0.55 so I will allow that as well:
Let's disable SELinux:
Turn down swappiness:
We've made quite a few changes, including kernel updates. Let's reboot and pick this up later.
Now, let's start installing Hadoop by downloading the Ambari repo and installing ambari-server.
At which time, we'll be prompted for the following:
And we can navigate using our web browser to our namenode IP:8080 and continue any configuration and installation steps from there.
The default credentials are
username: admin
password: admin
Then, we just need to select "Create a Cluster" and HDP takes us by the hand and pretty much does everything needed. And when I say everything, I mean it.
The only issue that you may encounter if you followed this guide is that Ambari will detect that iptables is running. We've made sure it allows everything we need so we can safely ignore this warning.
You can install just about any hadoop module you want with the click of a button, saving hours and maybe in some cases days of work. Amazing.
From there on, it's just a matter of waiting for the installation process to finish.
This is what we are greeted with:
And all this is for free. Wow.
References: http://docs.hortonworks.com/HDPDocuments/Ambari-1.7.0.0/Ambari_Install_v170/Ambari_Install_v170.pdf
- Red Hat Enterprise Linux (RHEL) v6.x
- Red Hat Enterprise Linux (RHEL) v5.x (deprecated)
- CentOS v6.x
- CentOS v5.x (deprecated)
- Oracle Linux v6.x
- Oracle Linux v5.x (deprecated)
- SUSE Linux Enterprise Server (SLES) v11, SP1 and SP3
- Ubuntu Precise v12.04
Let's press on with the environment set up. For this, I will have one namenode (hadoop1), two secondary servers (hadoop2 and hadoop3) and 5 datanodes (hadoop4, hadoop5, hadoop6, hadoop7 and hadoop8).
Node Type and Number | Node Name | IP |
---|---|---|
Namenode | hadoop1 | 192.168.0.101 |
Secondary Namenode | hadoop2 | 192.168.0.102 |
Tertiary Services | hadoop3 | 192.168.0.103 |
Datanode #1 | hadoop4 | 192.168.0.104 |
Datanode #2 | hadoop5 | 192.168.0.105 |
Datanode #3 | hadoop6 | 192.168.0.106 |
Datanode #4 | hadoop7 | 192.168.0.107 |
Datanode #5 | hadoop8 | 192.168.0.108 |
Unfortunately, the menial tasks that involve system configuration cannot be avoided, so let's press on:
First of all, let's update everything (This should be issued on every server in our cluster):
[root@hadoop1 ~]# yum -y update
[root@hadoop1 ~]# yum -y install wget
We'll need this (This should be issued on every server in our cluster.):
[root@hadoop1 ~]# yum -y install openssh-clients.x86_64
Note: It's always best to actually have your node FQDNs on your DNS server and skip the next two steps (editing the /etc/hosts and the /etc/host.conf files).
Now, let's edit our /etc/hosts to reflect our cluster (This should be issued on every server in our cluster):
[root@hadoop1 ~]# vi /etc/hosts192.168.0.101 hadoop1
192.168.0.102 hadoop2
192.168.0.103 hadoop3
192.168.0.104 hadoop4
192.168.0.105 hadoop5
192.168.0.106 hadoop6
192.168.0.107 hadoop7
192.168.0.108 hadoop8
We should also check our /etc/host.conf and our /etc/nsswitch.conf, unless we want to have resolvable hostnames:
[hadoop@hadoop1 ~]$ vi /etc/host.confmulti on
order hosts bind[hadoop@hadoop1 ~]$ vi /etc/nsswitch.conf....
#hosts: db files nisplus nis dns
hosts: files dns
....
We'll need a large number of file descriptors (This should be issued on every server in our cluster):
[root@hadoop1 ~]# vi /etc/security/limits.conf....
* soft nofile 65536
* hard nofile 65536
....
We should make sure that our network interface comes up automatically:
[root@hadoop1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eth0....
ONBOOT="yes"
....
And of course make sure our other networking functions, such as our hostname are correct:
[root@hadoop1 ~]# vi /etc/sysconfig/networkNETWORKING=yes
HOSTNAME=hadoop1
GATEWAY=192.168.0.1
We'll need to log in using SSH as root, so for the time being let's allow root logins. We might want to turn that off after we're done, as this is as insecure as they come:
[root@hadoop1 ~]# vi /etc/ssh/sshd_config....
PermitRootLogin yes
....
[root@hadoop1 ~]# service sshd restart
NTP should be installed on every server in our cluster. Now that we've edited our hosts file things are much easier though:
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" "exec yum -y install ntp ntpdate"; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" chkconfig ntpd on; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" ntpdate pool.ntp.org; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" service ntpd start; done
Set up passwordless ssh authentication (note that this will be configured automatically during the actual installation so this in not necessary; it is useful though, since it saves us from a lot of typing):
[root@hadoop1 ~]# ssh-keygen -t rsa
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do ssh "$host" mkdir -p .ssh; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh-copy-id -i ~/.ssh/id_rsa.pub "$host"; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" chmod 700 .ssh; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" chmod 640 .ssh/authorized_keys; done
Time to tone down our security a bit so that our cluster runs without problems. My PC's IP is 192.168.0.55 so I will allow that as well:
[root@hadoop1 ~]# iptables -F
[root@hadoop1 ~]# iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -i lo -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -s 192.168.0.101,192.168.0.102,192.168.0.103,192.168.0.104,192.168.0.105,192.168.0.106,192.168.0.107,192.168.0.108 -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -s 192.168.0.55 -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -j DROP
[root@hadoop1 ~]# iptables -A FORWARD -j DROP
[root@hadoop1 ~]# iptables -A OUTPUT -j ACCEPT
[root@hadoop1 ~]# iptables-save > /etc/sysconfig/iptables
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do scp /etc/sysconfig/iptables "$host":/etc/sysconfig/iptables; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" "iptables-restore < /etc/sysconfig/iptables"; done
Let's disable SELinux:
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" setenforce 0; done
[root@hadoop1 ~]# vi /etc/sysconfig/selinux# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do scp /etc/sysconfig/selinux "$host":/etc/sysconfig/selinux; done
Turn down swappiness:
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" "echo vm.swappiness = 1 >> /etc/sysctl.conf"; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" sysctl -p; done
We've made quite a few changes, including kernel updates. Let's reboot and pick this up later.
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do ssh "$host" reboot; done
[root@hadoop1 ~]# reboot
Now, let's start installing Hadoop by downloading the Ambari repo and installing ambari-server.
[root@hadoop1 ~]# wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.7.0/ambari.repo -O /etc/yum.repos.d/ambari.repo
[root@hadoop1 ~]# yum -y install ambari-server
[root@hadoop1 ~]# ambari-server setup
At which time, we'll be prompted for the following:
- If you have not temporarily disabled SELinux, you may get a warning. Accept the default (y), and continue.
- By default, Ambari Server runs under root. Accept the default (n) at the Customize user account for ambari-server daemon prompt, to proceed as root.
If you want to create a different user to run the Ambari Server, or to assign a previously created user, select y at the Customize user account for ambari-server daemon prompt, then provide a user name. - If you have not temporarily disabled iptables you may get a warning. Enter y to continue.
- Select a JDK version to download. Enter 1 to download Oracle JDK 1.7.
- Accept the Oracle JDK license when prompted. You must accept this license to download the necessary JDK from Oracle. The JDK is installed during the deploy phase.
- Select n at Enter advanced database configuration to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name is ambari. The default user name and password are ambari/bigdata.
Otherwise, to use an existing PostgreSQL, MySQL or Oracle database with Ambari, select y
- To use an existing Oracle 11g r2 instance, and select your own database name, user name, and password for that database, enter 2. Select the database you want to use and provide any information requested at the prompts, including host name, port, Service Name or SID, user name, and password.
- To use an existing MySQL 5.x database, and select your own database name, user name, and password for that database, enter 3. Select the database you want to use and provide any information requested at the prompts, including host name, port, database name, user name, and password.
- To use an existing PostgreSQL 9.x database, and select your own database name, user name, and password for that database, enter 4. Select the database you want to use and provide any information requested at the prompts, including host name, port, database name, user name, and password.
- At Proceed with configuring remote database connection properties [y/n] choose y
- Setup completes.
[root@hadoop1 ~]# ambari-server start
And we can navigate using our web browser to our namenode IP:8080 and continue any configuration and installation steps from there.
The default credentials are
username: admin
password: admin
Then, we just need to select "Create a Cluster" and HDP takes us by the hand and pretty much does everything needed. And when I say everything, I mean it.
The only issue that you may encounter if you followed this guide is that Ambari will detect that iptables is running. We've made sure it allows everything we need so we can safely ignore this warning.
You can install just about any hadoop module you want with the click of a button, saving hours and maybe in some cases days of work. Amazing.
It even installs and configures Nagios and Ganglia for you! |
That's a whole lot of work done with the press of a button right here! |
And all this is for free. Wow.
References: http://docs.hortonworks.com/HDPDocuments/Ambari-1.7.0.0/Ambari_Install_v170/Ambari_Install_v170.pdf
No comments:
Post a Comment