Hey! This is TOTALLY NOT FINISHED. Comments welcome, to Chris please.
The DICE Hadoop clusters
- We have Hadoop clusters.
- They run on DICE.
- Their configuration is done almost entirely with LCFG.
- Hadoop is needed by the Extreme Computing (EXC) module.
- Hadoop provides facilities for processing massive amounts of data in parallel. One of the popular ways is MapReduce, which is when a dataset is split up programmatically into many parts, each of which is then processed in some way, with the results then all being collected back together. (Read more at wikipedia.)
This page
This page sets out a basic picture of what Hadoop is and how to make our Hadoop clusters do their stuff. This necessarily makes it pretty long; sorry.
Software
We use the latest stable version of Hadoop, 2.9.2 at the time of writing. Downloads and project info can be found at
https://hadoop.apache.org/. We install it using a tarball of the binary distribution from that site, by means of a home-rolled hadoop RPM.
Documentation
- The definitive Apache Hadoop documentation is at https://hadoop.apache.org/docs/stable/. Pay attention to the left bar - each link is a manual on a different topic.
- The University Library has several Hadoop books. Try for instance the most recent edition of the O'Reilly book Hadoop: The Definitive Guide. You can read it online or there's a paper copy.
Our LCFG-managed Hadoop clusters.
There are three of them and they share most of their configuration.
- exc is the main Hadoop cluster, the one the students use. It's important to keep this running healthily, at least while it's needed by the students and staff of the Extreme Computing module. It's on physical servers in the AT server room - the
scutter
machines from the teaching cluster.
- exctest is used for testing things before deploying them on the exc cluster. It's used from time to time both by computing staff to test new configurations before putting them on the exc cluster, and by teaching staff to check cluster capabilities or to run through coursework before handing it out to students. Its nodes are virtual machines.
- devel is used when developing new configurations. It is only ever used by computing staff and it can be trashed with impunity. Its nodes are virtual machines.
The LCFG hadoop headers.
Each node in a Hadoop cluster includes two special LCFG headers:
- A header to say which cluster this node is in:
-
live/hadoop-exc-cluster.h
-
live/hadoop-exctest-cluster.h
-
live/hadoop-devel-cluster.h
- A header to say what job this node plays in the cluster:
-
dice/options/hadoop-cluster-slave-node.h
-
dice/options/hadoop-cluster-master-hdfs-node.h
-
dice/options/hadoop-cluster-master-yarn-node.h
Each node does just one of these three jobs.
There are quite a few more LCFG Hadoop headers. Some are included by the above headers. Others are left over from previous configurations and are now out of use.
Find out which computers are in a cluster.
Just search the LCFG profiles to find out which machines are using the cluster's LCFG header:
$ profmatch hadoop-devel-cluster
cannoli
clafoutis
strudel
teurgoule
tiramisu
trifle
(
profmatch
is in
/afs/inf.ed.ac.uk/group/cos/utils
.)
Find out which computer does which job.
Again, search the LCFG profiles for the names of hadoop headers:
$ profmatch devel-cluster slave-node
cannoli
strudel
tiramisu
trifle
$ profmatch exctest-cluster master-yarn-node
rat2
Hadoop's constituent parts.
HDFS.
HDFS is the (Hadoop distributed) filesystem. Data files are stored there. They're distributed across the cluster's nodes. HDFS keeps several copies of each file, and tries to ensure that the copies are on multiple nodes and on multiple racks.
In our simple cluster configuration, one machine in a Hadoop cluster is an HDFS
namenode and most of the others are
datanodes. (More complex clusters than ours will have secondary or backup namenodes too, for extra reliability.)
If you want to know more, start with
HDFS Architecture and
the HDFS User Guide.
HDFS namenode
The
namenode stores each file's metadata (its name, its owner, its position in the HDFS directory hierarchy, etc.). It also keeps track of where all the data is stored. Each of our clusters has one namenode.
Header
The
header for a namenode is
dice/options/hadoop-cluster-master-hdfs-node.h
.
Log
To see the namenode daemon's
log:
- Identify the hostname of the namenode for this cluster (see above).
ssh HOSTNAME
less /disk/scratch/hdfsdata/hadoop/logs/hadoop-hdfs-namenode-HOSTNAME.inf.ed.ac.uk.out
Process
The namenode process should be owned by
hdfs
and should look something like this (output from
ps axuww
):
hdfs 6578 0.9 5.4 3922780 221608 ? SNl Feb21 469:17 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_namenode -Xmx1500m -server -Dhadoop.log.dir=/disk/scratch/hdfsdata/hadoop/logs -Dhadoop.log.file=hadoop-hdfs-namenode-rat1.inf.ed.ac.uk.log -Dhadoop.home.dir=/opthadoop/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx2000m -Dcom.sun.management.jmxremote -Xmx2000m -Xmx2000m -Dcom.sun.management.jmxremote -Xmx2000m -Xmx2000m -Dcom.sun.management.jmxremote -Xmx2000m -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
systemd
The namenode's systemd service is called
hadoop-namenode
.
HDFS datanode
The
datanodes store data files. They co-operate to arrange for there to be multiple copies of each file. The cluster's "slave" nodes are all datanodes. (The idea is that the data is, as much as possible, on the local disk of the machine that's analysing that data.)
Header
The
header for a datanode is
dice/options/hadoop-cluster-slave-node.h
.
Log
To see a datanode daemon's
log:
- Identify the hostname of a slave node in this cluster (see above).
ssh HOSTNAME
less /disk/scratch/hdfsdata/hadoop/logs/hadoop-hdfs-datanode-HOSTNAME.inf.ed.ac.uk.out
Process
The datanode process should be owned by
hdfs
and should look something like this (output from
ps axuww
):
hdfs 6616 0.9 6.3 3361284 258472 ? SNl Feb21 461:02 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_datanode -Xmx1500m -server -Dhadoop.log.dir=/disk/scratch/hdfsdata/hadoop/logs -Dhadoop.log.file=hadoop-hdfs-datanode-rat5.inf.ed.ac.uk.log -Dhadoop.home.dir=/opt/hadoop/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
systemd
The datanode's systemd service is called
hadoop-datanode
.
Web
Each datanode presents information on the web at
https://HOSTNAME.inf.ed.ac.uk:50475/datanode.html
Note that the web interfaces on the EXC cluster are not accessible from non-cluster machines because the cluster machines are on a non-routed wire, as per school policy.
YARN.
YARN manages the cluster's resources. It decides which jobs will run where, and provides the resources to enable those jobs to run.
YARN resource manager.
Each of our clusters has one of these. It's the ultimate arbiter for all the cluster's resources, deciding what job will run where and monitoring each job's progress.
Header
The resource manager
header is
dice/options/hadoop-cluster-master-yarn-node.h
.
Log
To see the resource manager daemon's
log:
- Identify the hostname of the resource manager in this cluster (see above).
ssh HOSTNAME
less /disk/scratch/yarn/logs/yarn-yarn-resourcemanager-HOSTNAME.inf.ed.ac.uk.out
Process
The resource manager process should be owned by
yarn
and should look something like this (output from
ps axuww
):
yarn 6589 4.0 7.2 6203536 294048 ? Sl Feb21 1923:10 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_resourcemanager -Xmx4000m -Dhadoop.log.dir=/disk/scratch/yarn/logs -Dyarn.log.dir=/disk/scratch/yarn/logs -Dhadoop.log.file=yarn-yarn-resourcemanager-rat2.inf.ed.ac.uk.log -Dyarn.log.file=yarn-yarn-resourcemanager-rat2.inf.ed.ac.uk.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop/hadoop-2.9.2/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/disk/scratch/yarn/logs -Dyarn.log.dir=/disk/scratch/yarn/logs -Dhadoop.log.file=yarn-yarn-resourcemanager-rat2.inf.ed.ac.uk.log -Dyarn.log.file=yarn-yarn-resourcemanager-rat2.inf.ed.ac.uk.log -Dyarn.home.dir=/opt/hadoop/hadoop-2.9.2 -Dhadoop.home.dir=/opt/hadoop/hadoop-2.9.2 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop/hadoop-2.9.2/lib/native -classpath /opt/hadoop/conf:/opt/hadoop/conf:/opt/hadoop/conf:/opt/hadoop/hadoop-2.9.2/share/hadoop/common/lib/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/common/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/hdfs:/opt/hadoop/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-2.9.2/sharehadoop/hdfs/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/lib/*:/opt/hadoop/conf/rm-config/log4j.properties:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/timelineservice/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/timelineservice/lib/* org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
systemd
The resource manager's systemd service is called
hadoop-resourcemanager
.
YARN node manager.
Most of each cluster's nodes - what we've named the "slave" nodes - will run a YARN node manager. This monitors the node's resources for the resource manager.
Header
The node manager
header is
dice/options/hadoop-cluster-slave-node.h
.
Log
To see a nodemanager daemon's
log:
- Identify the hostname of a slave node in this cluster (see above).
ssh HOSTNAME
less /disk/scratch/yarn/logs/yarn-yarn-nodemanager-HOSTNAME.inf.ed.ac.uk.out
Process
The node manager process should be owned by
yarn
and should look something like this (output from
ps axuww
):
yarn 6607 1.7 9.2 6092312 374284 ? Sl Feb21 816:21 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_nodemanager -Xmx4000m -Dhadoop.log.dir=/disk/scratch/yarn/logs -Dyarn.log.dir=/disk/scratch/yarn/logs -Dhadoop.log.file=yarn-yarn-nodemanager-rat5.inf.ed.ac.uk.log -Dyarn.log.file=yarn-yarn-nodemanager-rat5.inf.ed.ac.uk.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop/hadoop-2.9.2/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/disk/scratch/yarn/logs -Dyarn.log.dir=/disk/scratch/yarn/logs -Dhadoop.log.file=yarn-yarn-nodemanager-rat5.inf.ed.ac.uk.log -Dyarn.log.file=yarn-yarn-nodemanager-rat5.inf.ed.ac.uk.log -Dyarn.home.dir=/opt/hadoop/hadoop-2.9.2 -Dhadoop.home.dir=/opt/hadoop/hadoop-2.9.2 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop/hadoop-2.9.2/lib/native -classpath /opt/hadoop/conf:/opt/hadoop/conf:/opt/hadoop/conf:/opt/hadoop/hadoop-2.9.2/share/hadoop/common/lib/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/common/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/hdfs:/opt/hadoop/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/hdfs/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/mapreduce/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/lib/*:/opt/hadoop/conf/nm-config/log4j.properties:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/timelineservice/*:/opt/hadoop/hadoop-2.9.2/share/hadoop/yarn/timelineservice/lib/* org.apache.hadoop.yarn.server.nodemanager.NodeManager
systemd
The node manager's systemd service is called
hadoop-nodemanager
.
Map Reduce
The Map Reduce framework is the classic, but not the only, algorithm for distributed processing of big data on Hadoop. It has one permanent daemon, the job history daemon; otherwise it's just fired up as needed by jobs.
The Map Reduce Job History daemon
This runs on the YARN master server alongside the YARN resource manager.
Header
The job history daemon is added in the resource manager header, which is
dice/options/hadoop-cluster-master-yarn-node.h
.
Log
To see the job history daemon's
log:
- Identify the hostname of the resource manager in this cluster (see above).
ssh HOSTNAME
less /disk/scratch/mapred/logs/mapred-mapred-historyserver-HOSTNAME.inf.ed.ac.uk.out
Process
The job history daemon's process should be owned by
mapred
and should look something like this (output from
ps axuww
):
mapred 57347 0.2 0.5 3522708 562280 ? Sl May07 9:23 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_historyserver -Xmx1500m -server -Dhadoop.log.dir=/disk/scratch/hdfsdata/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop/hadoop-2.9.2 -Dhadoop.id.str=mapred -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/hadoop/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/disk/scratch/mapred/logs -Dhadoop.log.file=mapred-mapred-historyserver-scutter02.inf.ed.ac.uk.log -Dhadoop.root.logger=INFO,console -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
systemd
The job history daemon's systemd service is called
hadoop-mapred
.
The configuration files
To see the Hadoop configuration files once they've been made by LCFG, login to a cluster node then
cd /opt/hadoop/conf
. Bear in mind that the content of these files depends on the machine's role in the cluster - the YARN master's configuration will be different to the HDFS master's, and a slave's will be different again.
These files are made by the
hadoop
and
file
components.
Hadoop daemons are started and stopped by systemd. The systemd service files are created by the LCFG
systemd
component using cpp macros which are defined in
dice/options/hadoop-cluster-node.h
and used in
live/hadoop-cluster-whatever-node.h
headers.
Privileged or "root" access
To do basic things to a Hadoop daemon, like start, stop or restart it, just use
systemctl
with the daemon's systemd service name (given above). For example,
systemctl status hadoop-mapred
systemctl restart hadoop-datanode
For anything more involved, you'll need privileged access. You get this simply by using a shell which has the same account, group, and Kerberos context that the daemon uses. For example to have superuser privilege over a datanode daemon, do this on a Hadoop slave node:
export KRB5CCNAME=/tmp/hdfs.${LOGNAME}
nsu hdfs
newgrp hadoop
kinit -k -t /etc/hadoop.dn.keytab dn/${HOSTNAME}
That's for privilege over the datanode daemon. For other daemons, refer to the table below to find the correct account and abbreviation to use.
daemon |
account |
abbreviation |
host |
datanode |
hdfs |
dn |
slaves |
node manager |
yarn |
nm |
slaves |
namenode |
hdfs |
nn |
the HDFS master |
resource manager |
yarn |
rm |
the YARN master |
job history service |
mapred |
jhs |
the YARN master |
How to run a test job
First create yourself some user filespace on HDFS:
- ssh onto the cluster's HDFS master node.
- Get superuser privilege for the HDFS namenode:
nsu hdfs
newgrp hadoop
export KRB5CCNAME=/tmp/hdfs.${LOGNAME}
kinit -k -t /etc/hadoop.nn.keytab nn/${HOSTNAME}
hdfs dfs -mkdir /user/${USER}
hdfs dfs -chown ${USER} /user/${USER}
exit
exit
logout
Now run the test job.
- ssh onto the cluster's YARN master node.
- Copy a bunch of Hadoop source files to your HDFS dir:
hdfs dfs -put $HADOOP_PREFIX/etc/hadoop input
hdfs dfs -ls input
- Only do this next step if you've already run the job and want to rerun it - it removes the output dir.
hdfs dfs -rm -r output
- Fire off your Hadoop job:
hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
- Copy the output from HDFS and examine it - you should see some word counts.
hdfs dfs -get output
cd output
ls
Useful commands
For commands which are useful throughout Hadoop:
For commands specific to HDFS:
For YARN commands:
Decommissioning a node
If you need to take a machine down for a period, perhaps because its hardware is faulty, you can remove it from Hadoop by decommissioning it. This is handled separately in HDFS and YARN.
in HDFS
First mark your node as excluded by adding this to its LCFG file:
!hadoop.excluded mSET(true)
Then tell HDFS to re-read the list of excluded nodes.
ssh <HDFS master>
hdfs dfsadmin -refreshNodes
The namenode log should announce "Starting decommission of ...". It then ensures that all data on the node is adequately replicated on the remaining cluster. If the node has a lot of data stored on it this may take a while. When it's complete, the namenode log will announce "Decommissioning complete for node ...".
in YARN
Still working on this.