Hadoop Clusters: Care and Feeding

If you're not computing staff, you've reached the wrong page - sorry! Read computing.help.inf.ed.ac.uk/hadoop-cluster instead.

This page tells computing staff about the Hadoop clusters.

Most of this page deals with the exc cluster. Most of the commands and config should work on the other clusters - just change the hostnames.

What is Hadoop?

A Hadoop cluster can process massive amounts of data in parallel. One of the popular ways is MapReduce, which is when a dataset is split up programmatically into many parts, each of which is then processed in some way, with the results then all being collected back together. (Read more at wikipedia.)

Hadoop is used here by the Extreme Computing (EXC) module.

We have several Hadoop clusters. The main one is called exc and it's the one the EXC students use. We also have clusters for testing and development.

How Hadoop works

Files

HDFS is the (Hadoop distributed) filesystem. Data files are stored there. They're distributed across the cluster's nodes. HDFS keeps several copies of each file, and tries to ensure that the copies are on multiple nodes and on multiple racks.

In our simple cluster configuration, one machine in a Hadoop cluster is an HDFS namenode and most of the others are datanodes. (More complex clusters than ours will have secondary or backup namenodes too, for extra reliability.)

If you want to know more, start with HDFS Architecture and the HDFS User Guide.

The namenode stores each file's metadata (its name, its owner, its position in the HDFS directory hierarchy, etc.). It also keeps track of where all the data is stored. Each of our clusters has one namenode.

The datanodes store data files. They co-operate to arrange for there to be multiple copies of each file. The cluster's "slave" nodes are all datanodes. (The idea is that the data is, as much as possible, on the local disk of the machine that's analysing that data.)

Each datanode presents information on the web at

https://HOSTNAME.inf.ed.ac.uk:50475/datanode.html
Note that the web interfaces on the EXC cluster are not accessible from non-cluster machines because the cluster machines are on a non-routed wire, as per school policy.

Resources

YARN manages the cluster's resources. It decides which jobs will run where, and provides the resources to enable those jobs to run.

Each of our clusters has one YARN resource manager. It's the ultimate arbiter for all the cluster's resources, deciding what job will run where and monitoring each job's progress.

The other nodes run a YARN node manager. This monitors the node's resources for the resource manager.

History

The Map Reduce framework is the classic, but not the only, algorithm for distributed processing of big data on Hadoop. It has one permanent daemon, the job history daemon; otherwise it's just fired up as needed by jobs. The Map Reduce Job History daemon runs on the YARN master server alongside the YARN resource manager.

Hadoop software and documentation

We use the latest stable version of Hadoop, 2.9.2 at the time of writing. Downloads and project info can be found at https://hadoop.apache.org/. We install it using a tarball of the binary distribution from that site, by means of a home-rolled hadoop RPM.

The manuals are at hadoop.apache.org/docs/r2.9.2/ You can also get there via https://hadoop.apache.org/docs/stable/. Look at the left bar on that page - each link is a manual on a different topic.

The University Library has several Hadoop books. Try for instance the most recent edition of the O'Reilly book Hadoop: The Definitive Guide. You can read it online or there's a paper copy.

Useful commands

For commands which are useful throughout Hadoop: For commands specific to HDFS: For YARN commands:

Which machine does what

There are three types of machine in each cluster:

  • A namenode - it runs the namenode, which is the master daemon for the cluster's HDFS distributed filesystem.
  • A resource manager - it runs the resource manager, which is the master daemon for the cluster's YARN resource allocation system. It also runs the cluster's job history server.
  • Compute nodes, also known as slave nodes. Each of these runs a datanode daemon (it manages the HDFS data that's on this node) and a node manager daemon (it manages YARN and jobs on this node).

Each of these functions runs in its own package account and Kerberos context. You'll need to know these accounts, and their abbreviations, to get extra privilege:

RoleSorted ascending runs on this host Account Abbreviation
datanode every compute node hdfs dn
job history server the resource manager mapred jhs
namenode the namenode hdfs nn
node manager every compute node hdfs nn
resource manager the resource manager yarn rm

How we configure Hadoop

The LCFG hadoop headers.

Each node in a Hadoop cluster includes two special LCFG headers:
  1. A header to say which cluster this node is in:
    • live/hadoop-exc-cluster.h
    • live/hadoop-exctest-cluster.h
    • live/hadoop-devel-cluster.h
  2. A header to say what job this node plays in the cluster:
    • dice/options/hadoop-cluster-slave-node.h
    • dice/options/hadoop-cluster-master-hdfs-node.h
    • dice/options/hadoop-cluster-master-yarn-node.h
Each node does just one of these three jobs.

There are quite a few more LCFG Hadoop headers. Some are included by the above headers. Others are left over from previous configurations and are now out of use.

Which machines are in each cluster?

See also How to list the nodes.

exc cluster

exc is the main Hadoop cluster - the one the students use. It's important to keep this running healthily, at least while it's needed by the students and staff of the Extreme Computing module. It's on physical servers in the AT server room - the scutter machines from the teaching cluster.

Machine Alias Role
scutter01 nn.exc namenode
scutter02 rm.exc resource manager
scutter03
to
scutter12
  compute nodes

exctest cluster

exctest is for testing. It's used from time to time by computing staff to test new configurations before putting them on the exc cluster, and by teaching staff to check cluster capabilities or to run through coursework before handing it out to students. Its nodes are virtual machines on jubilee.

Machine Alias Role
rat1 nn.exctest namenode
rat2 rm.exctest resource manager
job history server
rat3
to
rat7
  compute nodes

devel cluster

devel is used when developing new configurations and testing wild ideas. It is only ever used by computing staff and it can be trashed and rebuilt with impunity. Its nodes are virtual machines on jubilee.

Machine Alias Role
clafoutis nn.devel namenode
teurgoule rm.devel resource manager
job history server
strudel
and
tiramisu
  compute nodes.

Kerberos and privilege

The clusters use Kerberos for authentication. To get privileged access to Hadoop, you'll need to authenticate. For this you'll need to know the right machine and account and abbreviation to use. Find them in the tables above, then do this:

  • ssh machine
  • nsu account
  • newgrp hadoop
  • export KRB5CCNAME=/tmp/account.cc
  • kinit -k -t /etc/hadoop.abbreviation.keytab abbreviation/${HOSTNAME}

So here's how to get privileged access to the HDFS filesystem on the exc cluster:

  • ssh scutter01
  • nsu hdfs
  • newgrp hadoop
  • export KRB5CCNAME=/tmp/hdfs.cc
  • kinit -k -t /etc/hadoop.nn.keytab nn/${HOSTNAME}

There's a handy way to check that the mapping between the service abbreviation (e.g. rm) and its account (e.g. yarn) has been configured correctly:

  • ssh to any Hadoop node
  • hadoop org.apache.hadoop.security.HadoopKerberosName abbreviation/${HOSTNAME}@INF.ED.AC.UK
For example,
[scutter04]: hadoop org.apache.hadoop.security.HadoopKerberosName rm/${HOSTNAME}@INF.ED.AC.UK
Name: rm/scutter04.inf.ed.ac.uk@INF.ED.AC.UK to yarn
[scutter04]: 
So rm maps to the yarn account.

Admin tasks and how to do them

How to add users

On the exc cluster, users are added automatically. This process is driven by roles and capabilities. A prospective user of the exc cluster needs to gain the hadoop/exc/user capability. Several roles grant that, and you can discover them with e.g.
rfe -xf roles/hadoop
Most student users of the cluster will probably gain a suitable role automatically thanks to the Informatics database and Prometheus.

Making HDFS directories

Each user needs an HDFS home directory. On the exc cluster this is made by a script called mkhdfs which runs nightly. It ensures that each user with hadoop/exc/user has an HDFS directory. It runs on the namenode of the cluster, and it's installed by the hadoop-cluster-master-hdfs-node.h header.

There's a companion script called rmhdfs. It runs weekly, and looks for and lists those HDFS directories which don't have the capability associated with them. You can then consider deleting those directories at your leisure.

For other clusters, you could either either adapt mkhdfs or you could make an HDFS directory manually. Here's how to do that:

  1. Log in to the namenode with ssh and acquiring privileged access to HDFS.
  2. Then make the HDFS home directory:
     hdfs dfs -mkdir /user/${USER}
     hdfs dfs -chown ${USER} /user/${USER}
     exit
     exit
     logout
    

How to run a test job

This is how to check that the cluster is working.

  1. If you don't yet have an HDFS directory, here's how to make one.
  2. Now ssh to the YARN master node:
     ssh scutter02
  3. Put some files into your HDFS dir. These will act as input for the test job:
     hdfs dfs -put $HADOOP_PREFIX/etc/hadoop input
  4. List your files to check that they got there:
     hdfs dfs -ls input
  5. If you have already run the job and you want to rerun it, remove the output dir which the job makes:
     hdfs dfs -rm -r output
  6. Now submit the test job:
     hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
    You should see lots of messages about the job's progress. The job should finish within a minute or two.
  7. Once it's finished, transfer the job's output from HDFS:
     hdfs dfs -get output
  8. ... and take a look at what the job did:
     cd output
     ls
    You should see two files - an empty file called _SUCCESS and a file with a few word counts in it called part-r-00000. If you don't see _SUCCESS then the job didn't work.

What jobs are running?

This command needs privilege, see above. It lists the jobs which are currently running on the cluster.
mapred job -list

Checking the log files

Hadoop keeps comprehensive and informative log files. They're worth checking when you're doing anything with Hadoop, or when something seems to be wrong, or just to check that things are OK. Here's where to find them:

Component Log directory Host
HDFS namenode /disk/scratch/hdfsdata/hadoop/logs The namenode (the master HDFS host)
HDFS datanode /disk/scratch/hdfsdata/hadoop/logs All the compute nodes
YARN resource manager /disk/scratch/yarn/logs The resource manager (the master YARN host)
YARN node manager /disk/scratch/yarn/logs All the compute nodes
Job History Server /disk/scratch/mapred/logs The job history server host

For hostnames see Which machines are in each cluster? and How to list the nodes.

How to list the nodes

... using Hadoop commands

There are several configuration files which list the cluster nodes. To find them first ssh to any cluster node, then go to the Hadoop configuration directory:
cd $HADOOP_CONF_DIR
The nodes are named in these files:
File Contains
masters The cluster's master servers. For a simple cluster this would just be the HDFS namenode and the YARN resource master.
slaves The slave nodes of the cluster.
exclude Those slaves which are currently excluded from the cluster.
hosts All the nodes (masters + slaves).

Which host does HDFS think is the namenode?

hdfs getconf -namenodes
Does YARN know the state of the nodes?
yarn node -list -all

... using LCFG

Search the LCFG profiles to find the machines which are using a cluster's LCFG header.

Which machines are in a cluster?

$ profmatch hadoop-devel-cluster
clafoutis
strudel
teurgoule
tiramisu
Which machine does a particular job?
$ profmatch devel-cluster slave-node
strudel
tiramisu
$ profmatch exctest-cluster master-yarn-node
rat2

(profmatch is in /afs/inf.ed.ac.uk/group/cos/utils.)

Removing a node from the cluster

Here's how to remove a node from the cluster. You might need to do this if a machine has hardware trouble, or if you want to upgrade its firmware or its software, for example. You can only do this with a slave node - not one of the two master nodes.
  1. Add this to the bottom of the node's LCFG file:
    !hadoop.excluded   mSET(true)
  2. HDFS has to decommission the node (i.e. move its share of the HDFS data to other nodes):
    • Login to the namenode.
    • Acquire hdfs privilege (see above).
    • Tell the namenode to reconsider which nodes it should be using:
      hdfs dfsadmin -refreshNodes
    • Look at the namenode log. Wait for it to announce Decommissioning complete for node and the node's IP address.
    • You can also check that HDFS reports that the node's Decommission Status has changed from Normal to Decommissioned:
      hdfs dfsadmin -report
      This means that the node's share of the HDFS data has been copied off onto other nodes.
  3. YARN has to decommission the node.
    • Login to the resource manager (the YARN master) node.
    • Acquire privilege over the yarn resource manager.
    • yarn rmadmin -refreshNodes
    • Having done this, a list of the nodes should show your decommissioned host as DECOMMISSIONED:
      yarn node -list -all

Re-adding an excluded node to the cluster

Remove the hadoop.excluded resource that you added in Removing a node from the cluster then run through the same procedure as you did there.

Systemd services

Hadoop is started and stopped using systemd services (which are configured by LCFG).
Hadoop component systemd service Host
HDFS namenode hadoop-namenode.service The namenode (the master HDFS host)
HDFS datanode hadoop-datanode.service All the compute nodes
YARN resource manager hadoop-resourcemanager.service The resource manager (the master YARN host)
YARN node manager hadoop-nodemanager.service All the compute nodes
Job History Server hadoop-mapred.service The job history server host
These can be queried, started and stopped using systemctl in the usual way. For example:
# systemctl status hadoop-nodemanager
 hadoop-nodemanager.service - The hadoop nodemanager daemon
   Loaded: loaded (/etc/systemd/system/hadoop-nodemanager.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-09-19 10:16:33 BST; 1 weeks 4 days ago
 Main PID: 4573 (java)
   CGroup: /system.slice/hadoop-nodemanager.service
           └─4573 /usr/lib/jvm/java-1.8.0-sun/bin/java -Dproc_nodemanager -Xmx4000m -Dhadoop.log.dir=/disk/scratch/yarn/logs -Dya...

# systemctl start hadoop-datanode

How to make a new cluster

Mostly, you'll just need to copy the LCFG config which sets up the exc cluster; but there are a few manual steps too. You'll need to make a header, a namenode, a resource manager and a bunch of slave nodes. Once you've made your new cluster, and you've checked that the log files and systemd services look OK, don't forget to run a test job to check that your cluster works.

Hardware resources

Note that the YARN node manager - which runs on each slave, and matches up jobs with hardware resources - automatically determines what hardware resources are available. If you don't give the slaves enough hardware, the node managers will decide that jobs can't run! Even if you're making a little play cluster on VMs, you'll need to give each slave several CPUs. If you don't, not even a wee diddy test job will run. In tests, slaves with 3 VCPUs and 4GB memory were sufficient for the test job.

Make the header

Pick a one-word name for your cluster. These instructions will use the name dana. We'll start off by making a copy of the config for the exc cluster, so we can use that to configure the new dana cluster. Make a new header in subversion for your cluster. In our example we'll make live/hadoop-dana-cluster.h.
  1. Check out the live SubversionRepository and cd to the include/live directory.
  2. svn copy hadoop-exc-cluster.h hadoop-dana-cluster.h
  3. Edit it appropriately - perhaps like this:
    #ifndef LIVE_HADOOP_DANA_CLUSTER
    #define LIVE_HADOOP_DANA_CLUSTER
    #define HADOOP_CLUSTER_NAME dana
    #define HADOOP_CLUSTER_HDFS_MASTER dana01.inf.ed.ac.uk
    #define HADOOP_CLUSTER_YARN_MASTER dana02.inf.ed.ac.uk
    #define HADOOP_CLUSTER_KERBEROS
    #endif /* LIVE_HADOOP_DANA_CLUSTER */
  4. Commit it with svn ci -m "Header to configure the dana Hadoop cluster" hadoop-dana-cluster.h

Make a namenode

  1. Add your cluster's header to the profile of the machine that'll be the new cluster's namenode. Our example header is live/hadoop-dana-cluster.h.
  2. Below it, add the node type header, in this case probably dice/options/hadoop-cluster-master-hdfs-node.h
  3. Let LCFG make the machine's new profile and wait for it to reach the machine.
  4. ssh onto the machine.
  5. Acquire hdfs namenode privilege as described above.
  6. hdfs namenode -format
  7. In a separate session, make a directory then format the HDFS filesystem:
    $ nsu
    # mkdir /disk/scratch/hdfsdata/hadoop/namenode
    # chown hdfs:hadoop /disk/scratch/hdfsdata/hadoop/namenode
    # systemctl restart hadoop-namenode
  8. Back in the session with hdfs namenode privilege, build the base filesystem in HDFS.
     hdfs dfs -mkdir /user
     hdfs dfs -mkdir /tmp
     hdfs dfs -chmod 1777 /tmp
     hdfs dfs -mkdir /tmp/hadoop-yarn
     hdfs dfs -mkdir /tmp/hadoop-yarn/staging
     hdfs dfs -chmod 1777 /tmp/hadoop-yarn/staging
     hdfs dfs -mkdir /tmp/hadoop-yarn/staging/history
     hdfs dfs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate
     hdfs dfs -chmod 1777 /tmp/hadoop-yarn/staging/history/done_intermediate
     hdfs dfs -chown -R mapred:hadoop /tmp/hadoop-yarn/staging
     hdfs dfs -ls /
     exit
     exit
    
  9. And to support distcp:
     nsu
     cd /disk/scratch/hdfsdata
     mkdir cache
     chown hdfs:hdfs cache
     chmod go+wt cache
     exit
    
Here are a couple of ways to test that it's successfully up and running:

Make a resource manager

  1. As for the namenode, first add the cluster's LCFG header to the profile of the machine that'll be the resource manager. Our example header is live/hadoop-dana-cluster.h.
  2. Below it, add the node type header, in this case probably dice/options/hadoop-cluster-master-yarn-node.h.
  3. Let LCFG make the machine's new profile and wait for it to reach the machine.
  4. Reboot the machine once or twice.
Here are a couple of ways to test that it's successfully up and running:

Make a slave node

The rest of the hosts in the cluster will all be slave nodes. Here's how to make one:
  1. As for the namenode, first add the cluster's LCFG header to the profile of the machine that'll be a slave node. Our example header is live/hadoop-dana-cluster.h.
  2. Below it, add the node type header, in this case probably dice/options/hadoop-cluster-slave-node.h.
  3. Let LCFG make the machine's new profile and wait for it to reach the machine.
  4. Reboot the machine once or twice.
Here are a couple of ways to test that it's successfully up and running:
Topic revision: r55 - 19 Dec 2019 - 09:48:27 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies