Machine | Role | Account | Abbreviation |
---|---|---|---|
scutter01 | The namenode (the master node for the HDFS filesystem). | hdfs | nn |
scutter02 | The resource manager (the master node for the YARN resource allocation system). The job history server. |
yarn mapred |
rm jhs |
scutter03 to scutter12 |
The compute nodes. These run: a datanode (stores HDFS data) and a node manager (manages YARN and jobs on this node). |
hdfs yarn |
dn nm |
ssh machine
nsu account
newgrp hadoop
export KRB5CCNAME=/tmp/account.cc
kinit -k -t /etc/hadoop.abbreviation.keytab abbreviation/${HOSTNAME}
ssh scutter01
nsu hdfs
newgrp hadoop
export KRB5CCNAME=/tmp/hdfs.cc
kinit -k -t /etc/hadoop.nn.keytab nn/${HOSTNAME}
rm
) and its account (e.g. yarn
) has been configured correctly:
ssh
to any Hadoop node
hadoop org.apache.hadoop.security.HadoopKerberosName abbreviation/${HOSTNAME}@INF.ED.AC.UK
[scutter04]: hadoop org.apache.hadoop.security.HadoopKerberosName rm/${HOSTNAME}@INF.ED.AC.UK Name: rm/scutter04.inf.ed.ac.uk@INF.ED.AC.UK to yarn [scutter04]:So rm maps to the yarn account.
hadoop/exc/user
capability. Several roles grant that, and you can discover them with e.g.
rfe -xf roles/hadoopMost student users of the cluster will probably gain a suitable role automatically thanks to the Informatics database and Prometheus.
mkhdfs
which runs nightly. It ensures that each user with hadoop/exc/user
has an HDFS directory. It runs on the namenode of the cluster, and it's installed by the hadoop-cluster-master-hdfs-node.h
header.
There's a companion script called rmhdfs
. It runs weekly, and looks for and lists those HDFS directories which don't have the capability associated with them. You can then consider deleting those directories at your leisure.
For other clusters, you could either either adapt mkhdfs
or you could make an HDFS directory manually. Here's how to do that: hdfs dfs -mkdir /user/${USER} hdfs dfs -chown ${USER} /user/${USER} exit exit logout
ssh scutter02
hdfs dfs -put $HADOOP_PREFIX/etc/hadoop input
hdfs dfs -ls input
hdfs dfs -rm -r output
hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'You should see lots of messages about the job's progress. The job should finish within a minute or two.
hdfs dfs -get output
cd output lsYou should see two files - an empty file called _SUCCESS and a file with a few word counts in it called part-r-00000. If you don't see _SUCCESS then the job didn't work.
mapred job -list
Component![]() |
Log directory | Host |
---|---|---|
HDFS datanode | /disk/scratch/hdfsdata/hadoop/logs |
All the compute nodes |
HDFS namenode | /disk/scratch/hdfsdata/hadoop/logs |
The namenode (the master HDFS host) |
Job History Server | /disk/scratch/mapred/logs |
The job history server host |
YARN node manager | /disk/scratch/yarn/logs |
All the compute nodes |
YARN resource manager | /disk/scratch/yarn/logs |
The resource manager (the master YARN host) |
ssh
to any cluster node, then go to the Hadoop configuration directory:cd $HADOOP_CONF_DIRThe nodes are named in these files:
File | Contains |
---|---|
masters |
The cluster's master servers. For a simple cluster this would just be the HDFS namenode and the YARN resource master. |
slaves |
The slave nodes of the cluster. |
hosts |
All the nodes (masters + slaves ). |
hdfs getconf -namenodesDoes YARN know the state of the nodes?
yarn node -list -all
!hadoop.excluded mSET(true)
hdfs dfsadmin -refreshNodes
hdfs dfsadmin -report
yarn rmadmin -refreshNodesand then the node manager on the excluded host should stop and it should be shown here as decommissioned
yarn node -listbut instead it shows as "running", and the mention of the exclude procedure in the resource manager log mentions no hostnames. Broken, it seems.
Hadoop component | systemd service | Host |
---|---|---|
HDFS namenode | hadoop-namenode.service |
The namenode (the master HDFS host) |
HDFS datanode | hadoop-datanode.service |
All the compute nodes |
YARN resource manager | hadoop-resourcemanager.service |
The resource manager (the master YARN host) |
YARN node manager | hadoop-nodemanager.service |
All the compute nodes |
Job History Server | hadoop-mapred.service |
The job history server host |
systemctl
in the usual way. For example:
# systemctl status hadoop-nodemanager ● hadoop-nodemanager.service - The hadoop nodemanager daemon Loaded: loaded (/etc/systemd/system/hadoop-nodemanager.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2019-09-19 10:16:33 BST; 1 weeks 4 days ago Main PID: 4573 (java) CGroup: /system.slice/hadoop-nodemanager.service └─4573 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_nodemanager -Xmx4000m -Dhadoop.log.dir=/disk/scratch/yarn/logs -Dya...
exc
cluster; but there are a few manual steps too. You'll need to make a header, a namenode, a resource manager and a bunch of slave nodes.
Once you've made your new cluster, and you've checked that the log files and systemd services look OK, don't forget to run a test job to check that your cluster works.
live/hadoop-dana-cluster.h
. live
SubversionRepository and cd
to the include/live
directory.
svn copy hadoop-exc-cluster.h hadoop-dana-cluster.h
#ifndef LIVE_HADOOP_DANA_CLUSTER #define LIVE_HADOOP_DANA_CLUSTER #define HADOOP_CLUSTER_NAME dana #define HADOOP_CLUSTER_HDFS_MASTER dana01.inf.ed.ac.uk #define HADOOP_CLUSTER_YARN_MASTER dana02.inf.ed.ac.uk #define HADOOP_CLUSTER_KERBEROS #endif /* LIVE_HADOOP_DANA_CLUSTER */
svn ci -m "Header to configure the dana Hadoop cluster" hadoop-dana-cluster.h
live/hadoop-dana-cluster.h
.
dice/options/hadoop-cluster-master-hdfs-node.h
ssh
onto the machine.
hdfs namenode -format
$ nsu # mkdir /disk/scratch/hdfsdata/hadoop/namenode # chown hdfs:hadoop /disk/scratch/hdfsdata/hadoop/namenode # systemctl restart hadoop-namenode
hdfs dfs -mkdir /user hdfs dfs -mkdir /tmp hdfs dfs -chmod 1777 /tmp hdfs dfs -mkdir /tmp/hadoop-yarn hdfs dfs -mkdir /tmp/hadoop-yarn/staging hdfs dfs -chmod 1777 /tmp/hadoop-yarn/staging hdfs dfs -mkdir /tmp/hadoop-yarn/staging/history hdfs dfs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate hdfs dfs -chmod 1777 /tmp/hadoop-yarn/staging/history/done_intermediate hdfs dfs -chown -R mapred:hadoop /tmp/hadoop-yarn/staging hdfs dfs -ls / exit exit
distcp
:nsu cd /disk/scratch/hdfsdata mkdir cache chown hdfs:hdfs cache chmod go+wt cache exit
live/hadoop-dana-cluster.h
.
dice/options/hadoop-cluster-master-yarn-node.h
.
live/hadoop-dana-cluster.h
.
dice/options/hadoop-cluster-slave-node.h
.