Hadoop Clusters: Care and Feeding
If you're not computing staff,
you've reached the wrong page - sorry! Read
computing.help.inf.ed.ac.uk/hadoop-cluster instead.
This page tells computing staff about the Hadoop clusters.
Most of this page deals with the
exc cluster. Most of the commands and config should work on the other clusters - just change the hostnames.
What is Hadoop?
A Hadoop cluster can process massive amounts of data in parallel. One of the popular ways is MapReduce, which is when a dataset is split up programmatically into many parts, each of which is then processed in some way, with the results then all being collected back together. (
Read more at wikipedia.)
Hadoop is used here by the
Extreme Computing (EXC) module.
We have several Hadoop clusters. The main one is called
exc and it's the one the EXC students use. We also have clusters for testing and development.
How Hadoop works
Files
HDFS is the (Hadoop distributed) filesystem. Data files are stored there. They're distributed across the cluster's nodes. HDFS keeps several copies of each file, and tries to ensure that the copies are on multiple nodes and on multiple racks.
In our simple cluster configuration, one machine in a Hadoop cluster is an HDFS
namenode and most of the others are
datanodes. (More complex clusters than ours will have secondary or backup namenodes too, for extra reliability.)
If you want to know more, start with
HDFS Architecture and
the HDFS User Guide.
The
namenode stores each file's metadata (its name, its owner, its position in the HDFS directory hierarchy, etc.). It also keeps track of where all the data is stored. Each of our clusters has one namenode.
The
datanodes store data files. They co-operate to arrange for there to be multiple copies of each file. The cluster's "slave" nodes are all datanodes. (The idea is that the data is, as much as possible, on the local disk of the machine that's analysing that data.)
Each datanode presents information on the web at
https://HOSTNAME.inf.ed.ac.uk:50475/datanode.html
Note that the web interfaces on the EXC cluster are not accessible from non-cluster machines because the cluster machines are on a non-routed wire, as per school policy.
Resources
YARN manages the cluster's resources. It decides which jobs will run where, and provides the resources to enable those jobs to run.
Each of our clusters has one YARN resource manager. It's the ultimate arbiter for all the cluster's resources, deciding what job will run where and monitoring each job's progress.
The other nodes run a YARN node manager. This monitors the node's resources for the resource manager.
History
The Map Reduce framework is the classic, but not the only, algorithm for distributed processing of big data on Hadoop. It has one permanent daemon, the job history daemon; otherwise it's just fired up as needed by jobs. The Map Reduce Job History daemon runs on the YARN master server alongside the YARN resource manager.
Hadoop software and documentation
We use the latest stable version of Hadoop, 2.9.2 at the time of writing. Downloads and project info can be found at
https://hadoop.apache.org/. We install it using a tarball of the binary distribution from that site, by means of a home-rolled hadoop RPM.
The manuals are at
hadoop.apache.org/docs/r2.9.2/ You can also get there via
https://hadoop.apache.org/docs/stable/. Look at the left bar on that page - each link is a manual on a different topic.
The University Library has several Hadoop books. Try for instance the most recent edition of the O'Reilly book
Hadoop: The Definitive Guide. You can read it online or there's a paper copy.
Useful commands
For commands which are useful throughout Hadoop:
For commands specific to HDFS:
For YARN commands:
Which machine does what
There are three types of machine in each cluster:
- A namenode - it runs the namenode, which is the master daemon for the cluster's HDFS distributed filesystem.
- A resource manager - it runs the resource manager, which is the master daemon for the cluster's YARN resource allocation system. It also runs the cluster's job history server.
- Compute nodes, also known as slave nodes. Each of these runs a datanode daemon (it manages the HDFS data that's on this node) and a node manager daemon (it manages YARN and jobs on this node).
Each of these functions runs in its own package account and Kerberos context. You'll need to know these accounts, and their abbreviations, to get extra privilege:
Role |
runs on this host |
Account |
Abbreviation |
namenode |
the namenode |
hdfs |
nn |
resource manager |
the resource manager |
yarn |
rm |
job history server |
the resource manager |
mapred |
jhs |
datanode |
every compute node |
hdfs |
dn |
node manager |
every compute node |
hdfs |
nn |
How we configure Hadoop
The LCFG hadoop headers.
Each node in a Hadoop cluster includes two special LCFG headers:
- A header to say which cluster this node is in:
-
live/hadoop-exc-cluster.h
-
live/hadoop-exctest-cluster.h
-
live/hadoop-devel-cluster.h
- A header to say what job this node plays in the cluster:
-
dice/options/hadoop-cluster-slave-node.h
-
dice/options/hadoop-cluster-master-hdfs-node.h
-
dice/options/hadoop-cluster-master-yarn-node.h
Each node does just one of these three jobs.
There are quite a few more LCFG Hadoop headers. Some are included by the above headers. Others are left over from previous configurations and are now out of use.
Which machines are in each cluster?
See also
How to list the nodes.
exc cluster
exc is the main Hadoop cluster - the one the students use. It's important to keep this running healthily, at least while it's needed by the students and staff of the Extreme Computing module. It's on physical servers in the AT server room - the
scutter
machines from the teaching cluster.
Machine |
Alias |
Role |
scutter01 |
nn.exc |
namenode |
scutter02 |
rm.exc |
resource manager |
scutter03 to scutter12 |
|
compute nodes |
exctest cluster
exctest is for testing. It's used from time to time by computing staff to test new configurations before putting them on the
exc cluster, and by teaching staff to check cluster capabilities or to run through coursework before handing it out to students. Its nodes are virtual machines on
jubilee.
Machine |
Alias |
Role |
rat1 |
nn.exctest |
namenode |
rat2 |
rm.exctest |
resource manager job history server |
rat3 to rat7 |
|
compute nodes |
devel cluster
devel is used when developing new configurations and testing wild ideas. It is only ever used by computing staff and it can be trashed and rebuilt with impunity. Its nodes are virtual machines on
jubilee.
Machine |
Alias |
Role |
clafoutis |
nn.devel |
namenode |
teurgoule |
rm.devel |
resource manager job history server |
strudel and tiramisu |
|
compute nodes. |
Kerberos and privilege
The clusters use Kerberos for authentication. To get privileged access to Hadoop, you'll need to authenticate. For this you'll need to know the right
machine and
account and
abbreviation to use. Find them in the tables above, then do this:
-
ssh machine
-
nsu account
-
newgrp hadoop
-
export KRB5CCNAME=/tmp/account.cc
-
kinit -k -t /etc/hadoop.abbreviation.keytab abbreviation/${HOSTNAME}
So here's how to get privileged access to the HDFS filesystem on the
exc cluster:
-
ssh scutter01
-
nsu hdfs
-
newgrp hadoop
-
export KRB5CCNAME=/tmp/hdfs.cc
-
kinit -k -t /etc/hadoop.nn.keytab nn/${HOSTNAME}
There's a handy way to check that the mapping between the service abbreviation (e.g.
rm
) and its account (e.g.
yarn
) has been configured correctly:
-
ssh
to any Hadoop node
-
hadoop org.apache.hadoop.security.HadoopKerberosName abbreviation/${HOSTNAME}@INF.ED.AC.UK
For example,
[scutter04]: hadoop org.apache.hadoop.security.HadoopKerberosName rm/${HOSTNAME}@INF.ED.AC.UK
Name: rm/scutter04.inf.ed.ac.uk@INF.ED.AC.UK to yarn
[scutter04]:
So
rm maps to the
yarn account.
Admin tasks and how to do them
How to add users
On the
exc cluster, users are added automatically. This process is driven by roles and capabilities. A prospective user of the
exc cluster needs to gain the
hadoop/exc/user
capability. Several roles grant that, and you can discover them with e.g.
rfe -xf roles/hadoop
Most student users of the cluster will probably gain a suitable role automatically thanks to the Informatics database and Prometheus.
Making HDFS directories
Each user needs an HDFS home directory. On the
exc cluster this is made by a script called
mkhdfs
which runs nightly. It ensures that each user with
hadoop/exc/user
has an HDFS directory. It runs on the namenode of the cluster, and it's installed by the
hadoop-cluster-master-hdfs-node.h
header.
There's a companion script called
rmhdfs
. It runs weekly, and looks for and lists those HDFS directories which don't have the capability associated with them. You can then consider deleting those directories at your leisure.
For other clusters, you could either either adapt
mkhdfs
or you could make an HDFS directory manually. Here's how to do that:
- Log in to the namenode with ssh and acquiring privileged access to HDFS.
- Then make the HDFS home directory:
hdfs dfs -mkdir /user/${USER}
hdfs dfs -chown ${USER} /user/${USER}
exit
exit
logout
How to run a test job
This is how to check that the cluster is working.
- If you don't yet have an HDFS directory, here's how to make one.
- Now ssh to the YARN master node:
ssh scutter02
- Put some files into your HDFS dir. These will act as input for the test job:
hdfs dfs -put $HADOOP_PREFIX/etc/hadoop input
- List your files to check that they got there:
hdfs dfs -ls input
- If you have already run the job and you want to rerun it, remove the output dir which the job makes:
hdfs dfs -rm -r output
- Now submit the test job:
hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
You should see lots of messages about the job's progress. The job should finish within a minute or two.
- Once it's finished, transfer the job's output from HDFS:
hdfs dfs -get output
- ... and take a look at what the job did:
cd output
ls
You should see two files - an empty file called _SUCCESS and a file with a few word counts in it called part-r-00000. If you don't see _SUCCESS then the job didn't work.
What jobs are running?
This command needs privilege, see
above. It lists the jobs which are currently running on the cluster.
mapred job -list
Checking the log files
Hadoop keeps comprehensive and informative log files. They're worth checking when you're doing anything with Hadoop, or when something seems to be wrong, or just to check that things are OK.
Here's where to find them:
Component |
Log directory |
Host |
HDFS namenode |
/disk/scratch/hdfsdata/hadoop/logs |
The namenode (the master HDFS host) |
HDFS datanode |
/disk/scratch/hdfsdata/hadoop/logs |
All the compute nodes |
YARN resource manager |
/disk/scratch/yarn/logs |
The resource manager (the master YARN host) |
YARN node manager |
/disk/scratch/yarn/logs |
All the compute nodes |
Job History Server |
/disk/scratch/mapred/logs |
The job history server host |
For hostnames see
Which machines are in each cluster? and
How to list the nodes.
How to list the nodes
... using Hadoop commands
There are several configuration files which list the cluster nodes. To find them first
ssh
to any cluster node, then go to the Hadoop configuration directory:
cd $HADOOP_CONF_DIR
The nodes are named in these files:
File |
Contains |
masters |
The cluster's master servers. For a simple cluster this would just be the HDFS namenode and the YARN resource master. |
slaves |
The slave nodes of the cluster. |
exclude |
Those slaves which are currently excluded from the cluster. |
hosts |
All the nodes (masters + slaves ). |
Which host does HDFS think is the namenode?
hdfs getconf -namenodes
Does YARN know the state of the nodes?
yarn node -list -all
... using LCFG
Search the LCFG profiles to find the machines which are using a cluster's LCFG header.
Which machines are in a cluster?
$ profmatch hadoop-devel-cluster
clafoutis
strudel
teurgoule
tiramisu
Which machine does a particular job?
$ profmatch devel-cluster slave-node
strudel
tiramisu
$ profmatch exctest-cluster master-yarn-node
rat2
(
profmatch
is in
/afs/inf.ed.ac.uk/group/cos/utils
.)
Removing a node from the cluster
Here's how to remove a node from the cluster. You might need to do this if a machine has hardware trouble, or if you want to upgrade its firmware or its software, for example. You can only do this with a slave node - not one of the two master nodes.
- Add this to the bottom of the node's LCFG file:
!hadoop.excluded mSET(true)
- HDFS has to decommission the node (i.e. move its share of the HDFS data to other nodes):
- YARN has to decommission the node.
Re-adding an excluded node to the cluster
Remove the
hadoop.excluded
resource that you added in
Removing a node from the cluster then run through the same procedure as you did there.
Systemd services
Hadoop is started and stopped using systemd services (which are configured by LCFG).
Hadoop component |
systemd service |
Host |
HDFS namenode |
hadoop-namenode.service |
The namenode (the master HDFS host) |
HDFS datanode |
hadoop-datanode.service |
All the compute nodes |
YARN resource manager |
hadoop-resourcemanager.service |
The resource manager (the master YARN host) |
YARN node manager |
hadoop-nodemanager.service |
All the compute nodes |
Job History Server |
hadoop-mapred.service |
The job history server host |
These can be queried, started and stopped using
systemctl
in the usual way. For example:
# systemctl status hadoop-nodemanager
● hadoop-nodemanager.service - The hadoop nodemanager daemon
Loaded: loaded (/etc/systemd/system/hadoop-nodemanager.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2019-09-19 10:16:33 BST; 1 weeks 4 days ago
Main PID: 4573 (java)
CGroup: /system.slice/hadoop-nodemanager.service
└─4573 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_nodemanager -Xmx4000m -Dhadoop.log.dir=/disk/scratch/yarn/logs -Dya...
# systemctl start hadoop-datanode
How to make a new cluster
Mostly, you'll just need to copy the LCFG config which sets up the
exc
cluster; but there are a few manual steps too. You'll need to make a header, a namenode, a resource manager and a bunch of slave nodes.
Once you've made your new cluster, and you've checked that
the log files and
systemd services look OK, don't forget to
run a test job to check that your cluster works.
Hardware resources
Note that the YARN node manager - which runs on each slave, and matches up jobs with hardware resources - automatically determines what hardware resources are available. If you don't give the slaves enough hardware, the node managers will decide that jobs can't run! Even if you're making a little play cluster on VMs, you'll need to give each slave several CPUs. If you don't, not even
a wee diddy test job will run. In tests, slaves with 3 VCPUs and 4GB memory were sufficient for the test job.
Make the header
Pick a one-word name for your cluster. These instructions will use the name
dana. We'll start off by making a copy of the config for the
exc cluster, so we can use that to configure the new
dana cluster.
Make a new header in subversion for your cluster. In our example we'll make
live/hadoop-dana-cluster.h
.
- Check out the
live
SubversionRepository and cd
to the include/live
directory.
-
svn copy hadoop-exc-cluster.h hadoop-dana-cluster.h
- Edit it appropriately - perhaps like this:
#ifndef LIVE_HADOOP_DANA_CLUSTER
#define LIVE_HADOOP_DANA_CLUSTER
#define HADOOP_CLUSTER_NAME dana
#define HADOOP_CLUSTER_HDFS_MASTER dana01.inf.ed.ac.uk
#define HADOOP_CLUSTER_YARN_MASTER dana02.inf.ed.ac.uk
#define HADOOP_CLUSTER_KERBEROS
#endif /* LIVE_HADOOP_DANA_CLUSTER */
- Commit it with
svn ci -m "Header to configure the dana Hadoop cluster" hadoop-dana-cluster.h
Make a namenode
- Add your cluster's header to the profile of the machine that'll be the new cluster's namenode. Our example header is
live/hadoop-dana-cluster.h
.
- Below it, add the node type header, in this case probably
dice/options/hadoop-cluster-master-hdfs-node.h
- Let LCFG make the machine's new profile and wait for it to reach the machine.
-
ssh
onto the machine.
- Acquire hdfs namenode privilege as described above.
-
hdfs namenode -format
- In a separate session, make a directory then format the HDFS filesystem:
$ nsu
# mkdir /disk/scratch/hdfsdata/hadoop/namenode
# chown hdfs:hadoop /disk/scratch/hdfsdata/hadoop/namenode
# systemctl restart hadoop-namenode
- Back in the session with hdfs namenode privilege, build the base filesystem in HDFS.
hdfs dfs -mkdir /user
hdfs dfs -mkdir /tmp
hdfs dfs -chmod 1777 /tmp
hdfs dfs -mkdir /tmp/hadoop-yarn
hdfs dfs -mkdir /tmp/hadoop-yarn/staging
hdfs dfs -chmod 1777 /tmp/hadoop-yarn/staging
hdfs dfs -mkdir /tmp/hadoop-yarn/staging/history
hdfs dfs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate
hdfs dfs -chmod 1777 /tmp/hadoop-yarn/staging/history/done_intermediate
hdfs dfs -chown -R mapred:hadoop /tmp/hadoop-yarn/staging
hdfs dfs -ls /
exit
exit
- And to support
distcp
: nsu
cd /disk/scratch/hdfsdata
mkdir cache
chown hdfs:hdfs cache
chmod go+wt cache
exit
Here are a couple of ways to test that it's successfully up and running:
Make a resource manager
- As for the namenode, first add the cluster's LCFG header to the profile of the machine that'll be the resource manager. Our example header is
live/hadoop-dana-cluster.h
.
- Below it, add the node type header, in this case probably
dice/options/hadoop-cluster-master-yarn-node.h
.
- Let LCFG make the machine's new profile and wait for it to reach the machine.
- Reboot the machine once or twice.
Here are a couple of ways to test that it's successfully up and running:
Make a slave node
The rest of the hosts in the cluster will all be slave nodes. Here's how to make one:
- As for the namenode, first add the cluster's LCFG header to the profile of the machine that'll be a slave node. Our example header is
live/hadoop-dana-cluster.h
.
- Below it, add the node type header, in this case probably
dice/options/hadoop-cluster-slave-node.h
.
- Let LCFG make the machine's new profile and wait for it to reach the machine.
- Reboot the machine once or twice.
Here are a couple of ways to test that it's successfully up and running: