Supported Operating Systems
Cloudera Manager supports the following operating systems:- RHEL-compatible
- Red Hat Enterprise Linux and CentOS
- 5.7, 64-bit
- 6.4, 64-bit
- 6.4 in SE Linux mode
- 6.5, 64-bit
- Oracle Enterprise Linux with default kernel and Unbreakable Enterprise Kernel,
64-bit
- 5.6 (UEK R2)
- 6.4 (UEK R2)
- 6.5 (UEK R2, UEK R3)
- Red Hat Enterprise Linux and CentOS
- SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later is required.
- Debian - Wheezy (7.0 and 7.1), Squeeze (6.0) (deprecated), 64-bit
- Ubuntu - Trusty (14.04), Precise (12.04), Lucid (10.04) (deprecated), 64-bit
Unfortunately, the menial tasks that involve system configuration cannot be avoided, so let's press on:
First of all, let's update everything (This should be issued on every server in our cluster):
[root@hadoop1 ~]# yum -y update
[root@hadoop1 ~]# yum -y install wget
We'll need this (This should be issued on every server in our cluster.):
[root@hadoop1 ~]# yum -y install openssh-clients.x86_64
Note: It's always best to actually have your node FQDNs on your DNS server and skip the next two steps (editing the /etc/hosts and the /etc/host.conf files).
Now, let's edit our /etc/hosts to reflect our cluster (This should be issued on every server in our cluster):
[root@hadoop1 ~]# vi /etc/hosts192.168.0.101 hadoop1
192.168.0.102 hadoop2
192.168.0.103 hadoop3
192.168.0.104 hadoop4
192.168.0.105 hadoop5
192.168.0.106 hadoop6
192.168.0.107 hadoop7
192.168.0.108 hadoop8
We should also check our /etc/host.conf and our /etc/nsswitch.conf, unless we want to have resolvable hostnames:
[hadoop@hadoop1 ~]$ vi /etc/host.confmulti on
order hosts bind[hadoop@hadoop1 ~]$ vi /etc/nsswitch.conf....
#hosts: db files nisplus nis dns
hosts: files dns
....
We'll need a large number of file descriptors (This should be issued on every server in our cluster):
[root@hadoop1 ~]# vi /etc/security/limits.conf....
* soft nofile 65536
* hard nofile 65536
....
We should make sure that our network interface comes up automatically:
[root@hadoop1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eth0....
ONBOOT="yes"
....
And of course make sure our other networking functions, such as our hostname are correct:
[root@hadoop1 ~]# vi /etc/sysconfig/networkNETWORKING=yes
HOSTNAME=hadoop1
GATEWAY=192.168.0.1
We'll need to log in using SSH as root, so for the time being let's allow root logins. We might want to turn that off after we're done, as this is as insecure as they come:
[root@hadoop1 ~]# vi /etc/ssh/sshd_config....
PermitRootLogin yes
....
[root@hadoop1 ~]# service sshd restart
NTP should be installed on every server in our cluster. Now that we've edited our hosts file things are much easier though:
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" "exec yum -y install ntp ntpdate"; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" chkconfig ntpd on; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" ntpdate pool.ntp.org; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" service ntpd start; done
Set up passwordless ssh authentication (note that this will be configured automatically during the actual installation so this in not necessary; it is useful though, since it saves us from a lot of typing):
[root@hadoop1 ~]# ssh-keygen -t rsa
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do ssh "$host" mkdir -p .ssh; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh-copy-id -i ~/.ssh/id_rsa.pub "$host"; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" chmod 700 .ssh; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" chmod 640 .ssh/authorized_keys; done
Time to tone down our security a bit so that our cluster runs without problems. My PC's IP is 192.168.0.55 so I will allow that as well:
[root@hadoop1 ~]# iptables -F
[root@hadoop1 ~]# iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -i lo -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -s 192.168.0.101,192.168.0.102,192.168.0.103,192.168.0.104,192.168.0.105,192.168.0.106,192.168.0.107,192.168.0.108 -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -s 192.168.0.55 -j ACCEPT
[root@hadoop1 ~]# iptables -A INPUT -j DROP
[root@hadoop1 ~]# iptables -A FORWARD -j DROP
[root@hadoop1 ~]# iptables -A OUTPUT -j ACCEPT
[root@hadoop1 ~]# iptables-save > /etc/sysconfig/iptables
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do scp /etc/sysconfig/iptables "$host":/etc/sysconfig/iptables; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" "iptables-restore < /etc/sysconfig/iptables"; done
Let's disable SELinux:
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" setenforce 0; done
[root@hadoop1 ~]# vi /etc/sysconfig/selinux# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do scp /etc/sysconfig/selinux "$host":/etc/sysconfig/selinux; done
Turn down swappiness. Cloudera actually recommend turning down swappiness to 0, I prefer 1:
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" "echo vm.swappiness = 1 >> /etc/sysctl.conf"; done
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}'); do ssh "$host" sysctl -p; done
We've made quite a few changes, including kernel updates. Let's reboot and pick this up later.
[root@hadoop1 ~]# for host in $(grep hadoop /etc/hosts | awk '{print $2}' | grep -v hadoop1); do ssh "$host" reboot; done
[root@hadoop1 ~]# reboot
Now, let's start installing Hadoop by downloading and running the Cloudera manager and installation script.
[root@hadoop1 ~]# wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
[root@hadoop1 ~]# chmod u+x cloudera-manager-installer.bin
[root@hadoop1 ~]# ./cloudera-manager-installer.bin
The installation process is very straight-forward to say the least. You only have to read a few licence agreements and select a few "Next" options.
After a while, you'll need to point your web browser to the system whose IP you installed cloudera manager on, port 7180. In my case therefore it's 192.168.0.101:7180.
Just in case, take a look at the logs before actually logging in. If there doesn't seem to be a cloudera manager service available listening at that address and port, wait for a bit until you see the relevant message:
[root@hadoop1 ~]# netstat -ntplActive Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:7432 0.0.0.0:* LISTEN 1807/postgres
tcp 0 0 0.0.0.0:7182 0.0.0.0:* LISTEN 2419/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 920/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1045/master
tcp 0 0 :::7432 :::* LISTEN 1807/postgres
tcp 0 0 :::22 :::* LISTEN 920/sshd
tcp 0 0 ::1:25 :::* LISTEN 1045/master [root@hadoop1 ~]# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log ....
2015-03-20 05:54:54,092 INFO WebServerImpl:org.mortbay.log: jetty-6.1.26.cloudera.4
2015-03-20 05:54:54,140 INFO WebServerImpl:org.mortbay.log: Started SelectChannelConnector@0.0.0.0:7180
2015-03-20 05:54:54,140 INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.
2015-03-20 05:54:54,844 INFO SearchRepositoryManager-0:com.cloudera.server.web.cmf.search.components.SearchRepositoryManager: Finished constructing repo:2015-03-20T03:54:54.844Z
The default username and password are admin and admin.
As before, it is an extremely straight-forward process. The only point where you might be uncertain in regards as to what you should choose is when Cloudera asks if it should install using traditional installation methods such as .rpm or .deb packages or Cloudera's parcels method.
According to Cloudera, among other benefits, parcels provide a mechanism for upgrading the packages installed on a cluster from within the Cloudera Manager Admin Console with minimal disruption. So let's proceed using parcels.
Once it starts the cluster installation, it needs a fair bit of time to complete, so sit back and relax. Note that if the installation on a particular node appears to get stuck at "Acquiring Installation lock", just log on there, remove the lock:
[root@hadoop1 ~]# rm -f /tmp/.scm_prepare_node.lock
and abort and retry.
After that, you need to pick which Hadoop services should run on which server, create your databases (at which point you should also note the usernames, passwords and database names for future reference), and review base directory locations. We're going to do a pretty basic vanilla installation here so we choose custom and:
After that, we'll need to wait for a tad for the manager to start all the services and after that we'll be good to go.
And that's what you get for installing a Hadoop cluster on tiny VMs! |
To do that, go to your cloudera manager UI, select "Hue" and click on "Hue Web UI".
Just select your username and password that you will use for hue. As soon as you're in, it will do a few automatic checks and ask you if you need to create new users.
Which means that we have everything up and running and we can actually use our Hadoop cluster using a Web browser instead of going through everything manually!
References: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-6-2/Cloudera-Manager-Managing-Clusters/cmmc_parcel_upgrade.html
No comments:
Post a Comment