In our previous "Creating a Hadoop Cluster" post, we saw how we can install a Hadoop cluster using Hortonworks.
Great, we built a cluster, but how do we actually feed it with data and how do we make it process it?
The easy way is to install hue and do pretty much everything using a web browser. Sounds good? Let's do it.
Hue supports the following operating systems:
As before, I'm going to use RHEL 6.6 for this.
You also need to have these Hadoop components installed:
And let's remember my cluster details:
First of all, go to Ambari, from the left hand side menu select HDFS and go to "Configs". There you need to ensure that WebHDFS is enabled.
Then, you need to do the following adjustments:
Go to "Custom core-site" and add the following properties:
Great, we built a cluster, but how do we actually feed it with data and how do we make it process it?
The easy way is to install hue and do pretty much everything using a web browser. Sounds good? Let's do it.
Hue supports the following operating systems:
- Red Hat Enterprise Linux (RHEL) v6.x
- Red Hat Enterprise Linux (RHEL) v5.x (deprecated)
- CentOS v6.x
- CentOS v5.x (deprecated)
- Oracle Linux v6.x
- Oracle Linux v5.x (deprecated)
- SUSE Linux Enterprise Server (SLES) v11, SP1 and SP3
- Ubuntu Precise v12.04
You also need to have these Hadoop components installed:
Component | Required | Applications | Notes |
---|---|---|---|
HDFS | Yes | Core, Filebrowser | HDFS access through WebHDFS or HttpFS |
YARN | Yes | JobDesigner, JobBrowser, Hive | Transitive dependency via Hive or Oozie |
Oozie | No | JobDesigner, Oozie | Oozie access through REST API |
Hive | No | Hive, HCatalog | Beeswax server uses the Hive client libraries |
WebHCat | No | HCatalog, Pig | HCatalog and Pig use WebHcat REST API |
And let's remember my cluster details:
Node Type and Number | Node Name | IP |
---|---|---|
Namenode | hadoop1 | 192.168.0.101 |
Secondary Namenode | hadoop2 | 192.168.0.102 |
Tertiary Services | hadoop3 | 192.168.0.103 |
Datanode #1 | hadoop4 | 192.168.0.104 |
Datanode #2 | hadoop5 | 192.168.0.105 |
Datanode #3 | hadoop6 | 192.168.0.106 |
Datanode #4 | hadoop7 | 192.168.0.107 |
Datanode #5 | hadoop8 | 192.168.0.108 |
First of all, go to Ambari, from the left hand side menu select HDFS and go to "Configs". There you need to ensure that WebHDFS is enabled.
Then, you need to do the following adjustments:
Go to "Custom core-site" and add the following properties:
Key | Value |
---|---|
hadoop.proxyuser.hue.hosts | * |
hadoop.proxyuser.hue.groups | * |
hadoop.proxyuser.hcat.groups | * |
hadoop.proxyuser.hcat.hosts | * |
Save your changes. Restart any services that might need it due to the config changes. Now from the left hand side menu, select Hive. Go to "Custom webhcat-site" and add the following properties:
Save your changes. Restart any services that might need it due to the config changes. From the left hand side menu, select Oozie. Go to "Custom oozie-site" and add the following properties:
Save your changes. Restart any services that might need it due to the config changes.
Finally, go to your left hand side menu, select HDFS and select "Service Actions" and Stop. This is needed since we will be installing Hue.
OK, let's go to the system that will be our Hue server and install it (this should really be the same system that has Hive installed on it, hadoop3 in my case):
We'll need a randomly-generated password:
Copy this string, you'll need it soon. Now, let's edit the hue.ini configuration file to suit our needs:
Paste your randomly-generated password next to the secret_key= then change the port that Hue will listen on and enter your correct time zone (if required).
We 're not finished yet with this file so, let's continue editing. Go to the [hadoop] section:
Key | Value |
---|---|
webhcat.proxyuser.hue.hosts | * |
webhcat.proxyuser.hue.groups | * |
Save your changes. Restart any services that might need it due to the config changes. From the left hand side menu, select Oozie. Go to "Custom oozie-site" and add the following properties:
Key | Value |
---|---|
oozie.service.ProxyUserService.proxyuser.hue.hosts | * |
oozie.service.ProxyUserService.proxyuser.hue.groups | * |
Save your changes. Restart any services that might need it due to the config changes.
Finally, go to your left hand side menu, select HDFS and select "Service Actions" and Stop. This is needed since we will be installing Hue.
OK, let's go to the system that will be our Hue server and install it (this should really be the same system that has Hive installed on it, hadoop3 in my case):
[root@hadoop3 ~]# yum -y install hue
We'll need a randomly-generated password:
[root@hadoop3 ~]# perl -e 'my @chars = ("A".."Z", "a".."z", "0".."9", "!", "@", "#", "\$", "%", "\^", "&", "*", "-", "\_", "=", "+", "\\", "|", "[", "{", "]", "}", ";", ":", ",", "<", ".", ">", "/", "?"); my $string; $string .= $chars[rand @chars] for 0..59; print "$string\n";'QJy9@?s-g5UhS{I]IXkSC_ex%{@#za8?EcV#%@sasYX-ngI+|Qr$KHn/c]g]
Copy this string, you'll need it soon. Now, let's edit the hue.ini configuration file to suit our needs:
[root@hadoop3 ~]# vi /etc/hue/conf/hue.ini....
# Set this to a random string, the longer the better.
# This is used for secure hashing in the session store.
secret_key=QJy9@?s-g5UhS{I]IXkSC_ex%{@#za8?EcV#%@sasYX-ngI+|Qr$KHn/c]g]
# Webserver listens on this address and port
http_host=0.0.0.0
http_port=8000
# Time zone name
time_zone=Etc/GMT
....
Paste your randomly-generated password next to the secret_key= then change the port that Hue will listen on and enter your correct time zone (if required).
We 're not finished yet with this file so, let's continue editing. Go to the [hadoop] section:
....
###########################################################################
# Settings to configure your Hadoop cluster.
###########################################################################
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://hadoop1:8020
# Use WebHdfs/HttpFs as the communication mechanism. To fallback to
# using the Thrift plugin (used in Hue 1.x), this must be uncommented
# and explicitly set to the empty value.
webhdfs_url=http://hadoop1:50070/webhdfs/v1
## security_enabled=true
[[yarn_clusters]]
[[[default]]]
# Whether to submit jobs to this cluster
submit_to=true
## security_enabled=false
# Resource Manager logical name (required for HA)
## logical_name=
# URL of the ResourceManager webapp address (yarn.resourcemanager.webapp.address)
resourcemanager_api_url=http://hadoop2:8088
# URL of Yarn RPC adress (yarn.resourcemanager.address)
resourcemanager_rpc_url=http://hadoop2:8050
# URL of the ProxyServer API
proxy_api_url=http://hadoop2:8088
# URL of the HistoryServer API
history_server_api_url=http://hadoop2:19888
# URL of the NodeManager API
node_manager_api_url=http://hadoop1:8042
# HA support by specifying multiple clusters
# e.g.
# [[[ha]]]
# Enter the host on which you are running the failover Resource Manager
#resourcemanager_api_url=http://failover-host:8088
#logical_name=failover
#submit_to=True
....
And make sure you enter the correct namenodes and the ports which they listen on for the corresponding services. Configure JobDesigner and Oozie:
....
###########################################################################
# Settings to configure liboozie
###########################################################################
[liboozie]
# The URL where the Oozie service runs on. This is required in order for
# users to submit jobs.
oozie_url=http://hadoop3:11000/oozie
## security_enabled=true
# Location on HDFS where the workflows/coordinator are deployed when submitted.
## remote_deployement_dir=/user/hue/oozie/deployments
....
Moving on, we'll need to configure beeswax:
....
[beeswax]
# Host where Hive server Thrift daemon is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=hadoop3
beeswax_server_host=hadoop3
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/etc/hive/conf
hive_home_dir=/usr/hdp/2.2.0.0-2041/hive
# Timeout in seconds for thrift calls to Hive service
## server_conn_timeout=120
# Set a LIMIT clause when browsing a partitioned table.
# A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.
## browse_partitioned_table_limit=250
# A limit to the number of rows that can be downloaded from a query.
# A value of -1 means there will be no limit.
# A maximum of 65,000 is applied to XLS downloads.
## download_row_limit=1000000
# Hue will try to close the Hive query when the user leaves the editor page.
# This will free all the query resources in HiveServer2, but also make its results inaccessible.
## close_queries=false
# Option to show execution engine choice.
## show_execution_engine=False
# "Go to column pop up on query result page. Set to false to disable"
## go_to_column=true
....
Your hive_home_dir will be /usr/hdp/your_hdp_version/hive. You might need to check that manually on your Hive server. And finally:
....
###########################################################################
# Settings for the User Admin application
###########################################################################
[useradmin]
# The name of the default user group that users will be a member of
default_user_group=hadoop
default_username=hue
default_user_password=1111
[hcatalog]
templeton_url=http://hadoop3:50111/templeton/v1/
security_enabled=false
[about]
tutorials_installed=false
[pig]
udf_path="/tmp/udfs"
....
That was it. Now go to Ambari and start your HDFS again and once that is done, start hue.
[root@hadoop3 ~]# service hue start
If you go to your Hive server's IP:8000 (http://hadoop3:8000 or http://192.168.0.103:8000 in my case), you'll be greeted with this:
Just select your username and password that you will use for hue. As soon as you're in, select "Check for misconfiguration" to check that all is ok. If you missed anything, make sure that you haven't missed a step or maybe forgot to stop your HDFS, or perhaps hue was mistakenly started before you actually edited its file and needs a restart now. After it's done you should get this:
Which means that we have everything up and running and we can actually use our Hadoop cluster using a Web browser instead of going through everything manually!
References: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-hue.html
No comments:
Post a Comment