Hadoop Cluster: Care and Feeding

Note that this page is out of date and is currently being revised contact iainr@infREMOVE_THIS.ed.ac.uk for more information.

This page covers Hadoop maintenance and configuration. If you just want to use Hadoop then look at HadoopCluster instead.

Machines

The machines are LCFG-maintained DICE servers running the current desktop version of DICE.

Machine Alias Role
scutter01 namenode name node...
scutter02 jobtracker tracker and secondary namenode
scutter01 — scutter07 n/a datanode(s) and tasktracker(s)

The nodes can be found either in the dedicated beowulf racking/shelving - to determine which, check the network wiring configuration: if the machine is attached to a bw* switch, it's in the dedicated racking, otherwise it can be found in the "infrastructure shelving" at the far site of the server room.

Configuration

Most of the extra LCFG configuration these machines need is in

dice/options/hadoop.h (generic DICE-level hadoop config)

live/hadoop.h (volatile DICE-level hadoop config)

live/mscteach_hadoop.h (cluster-specific hadoop)

(If you wish to create another cluster you'll have to replicate the mscteach-level headers)

The Hadoop installation is owned and run by the hadoop account. It has a local homedir /opt/hadoop on each machine.

The header installs a (locally-built) Hadoop RPM into this directory. It contains the 2.7.x Hadoop distribution.

The configuration files are made by the lcfg-hadoop component.

The following things need to be configured. Most of them are documented in chapters 9 and 10 of the O'Reilly book.

SSH
This has to be configured to allow hadoop on each machine to
ssh freely to any other machine in the cluster. This needs some files in /opt/hadoop/.ssh. All of them are created automatically by lcfg-file except for the private key file id_dsa. Copy this by hand to any machine which doesn't already have it.

conf
Configuration files go in /opt/hadoop/conf. Hadoop expects
them to go in /opt/hadoop/hadoop-2.7.x/conf so lcfg-file automatically makes a symlink there.
hadoop-env.sh
Define JAVA_HOME. Add -Xmx2000m to
HADOOP_NAMENODE_OPTS and HADOOP_SECONDARY_NAMENODE_OPTS to make more memory available to those processes. Add -o StrictHostKeyChecking=no to HADOOP_SSH_OPTS. Change HADOOP_LOG_DIR. Change HADOOP_PID_DIR.
core-site.xml
Define the namenode (fs.default.name) and the
temporary directory (hadoop.tmp.dir). Point Hadoop at the rack awareness script (see below) using topology.script.file.name.
hdfs-site.xml
Specify the location of HDFS data
(dfs.name.dir, dfs.data.dir, fs.checkpoint.dir). Turn off file permissions (dfs.permissions). Point to the web interfaces to stop Hadoop doing the wrong thing (dfs.http.address, dfs.secondary.http.address). Point Hadoop at the permitted hosts file (see below) (dfs.hosts).
mapred-site.xml
Define the jobtracker (mapred.job.tracker).
Increase the tasktracker memory (mapred.child.java.opts). Define the maximum number of tasks per machine (mapred.tasktracker.map.tasks.maximum, mapred.tasktracker.reduce.tasks.maximum) - the book advises that you set this to one less than the number of cores on the machine. Turn on speculative execution (mapred.map.tasks.speculative.execution, mapred.reduce.tasks.speculative.execution). When Hadoop starts up make it attempt to recover any jobs that were running when it shut down (mapred.jobtracker.restart.recover). Define the maximum memory of Map/Reduce child processes (mapred.child.ulimit). Point to the permitted hosts file (mapred.hosts). Enable the Fair Scheduler (mapred.jobtracker.taskScheduler). Set the number of reduce tasks per job (mapred.reduce.tasks).
slaves
Contains the FQDN of each data node.
masters
Contains the FQDN of the secondary
name node.
hosts
contains the full name of every permitted Hadoop node. If a machine isn't in this list it can't
connect to the supervising nodes.
rack-awareness.py
a script which takes the name or address of a
node and returns the name of the switch it's connected to - /bw00, /bw01 or /bw02. The script could be improved smile

All of this is done by LCFG. See the files mentioned above for details.

Spanning maps noting the roles of each node should be updated using the following resources:

hadoop.slave
does a thing?
hadoop.hosts
does a thing?
hadoop.master
does a thing?

Adding a New User to Hadoop

Hadoop users need the secondary role hadoopuser. The primary role module-exc gives you hadoopuser automatically but people not on the Extreme Computing course need to be given hadoopuser.

From the capability Hadoop users (and HDFS directories) should be created automatically by an crontab script on each node's bin directory. You can establish a current list with:

hdfs dfs -ls /user

If you would like to hasten this process, you can create the user and directory manually:

    • ssh namenode.inf.ed.ac.uk
    • nsu hadoop
    • ~/bin/addhadoopuser username

Removing a User from Hadoop

Just undo what you did when the user was added to Hadoop.

  1. Delete the user's HDFS directory:
    • ssh namenode.inf.ed.ac.uk
    • nsu hadoop
    • hadoop dfs -rmr  /user/UUN
  2. Remove the secondary role hadoopuser, if the account still has it (and
didn't get it automatically by means of a primary role like module-exc).

Starting and Stopping Hadoop

To Start Hadoop
  1. On the namenode nsu hadoop then
/opt/hadoop/hadoop-2.7.x/sbin/start-dfs.sh. Wait a little while then check the HDFS health at http://namenode.inf.ed.ac.uk:50070.
    1. Then on the jobtracker nsu hadoop then
/opt/hadoop/hadoop-2.7.x/sbin/start.yarn.sh. Wait a little while then check the yarn tracker status at http://jobtracker.inf.ed.ac.uk:8088.
To Stop Hadoop
  1. On the jobtracker nsu hadoop then
/opt/hadoop/hadoop-2.7.x/sbin/stop-yarn.sh.
    1. On the namenode nsu hadoop then
/opt/hadoop/hadoop-2.7.x/bin/stop-dfs.sh.

Shutting down all the nodes

In an emergency the whole cluster can be safely shut down by logging into the namenode and running ~hadoop/bin/shutdownhadoop. This should log onto the jobtracker node, shut down yarn, then generate a list of active nodes, shutdown dfs and finally log into each node and run poweroff. It's currently untested

Sections beyond this have not been revised

Spare Machines

Not really relevant -- but due to be revised to cover emergency replacement

Spare desktop machines can be added to the cluster.

To add a machine:

  1. Install DICE on the machine.
  2. Add #include <live/bwhadoop_2core.h> to its profile. Reboot
the machine once it has its new LCFG profile.
  1. Edit an up to date copy of live/list-of-spares-on-hadoop.h :
    1. Add the machine's full hostname to the SPARE_MACHINES_HOSTNAMES list
    2. Add the machine's IP address to the SPARE_MACHINES_ADDRESSES list
  2. svn commit your changes.
  3. Wait for the new LCFG profiles to be compiled and downloaded to each
Hadoop machine.
  1. If there are no jobs running on the jobtracker node then restart
jobtracker:
    1. Check the jobtracker web page to make sure that there are no jobs running.
    2. Stop Hadoop as described above.
    3. Start Hadoop as described above.
  1. If there are jobs running the jobtracker
    1. On the node to be added
      1. nsu hadoop
      2. cd /opt/hadoop/hadoop-0.20.2/bin
      3. ./hadoop-daemon.sh start datanode
      4. ./hadoop-daemon.sh start tasktracker
    2. Check that your machine is in the namenode's list of live HDFS nodes and in the jobtracker's list of active Map/Reduce nodes.

To remove a machine from the cluster:

  1. Edit live/list-of-spares-on-hadoop.h :
    1. Add the machine's full hostname to the REMOVE_MACHINES_HOSTNAMES
list
    1. Leave it in the other lists for now.
  1. svn commit your changes.
  2. Wait for the new LCFG profiles to be compiled and downloaded to each
Hadoop machine.
  1. Refresh the namenode:
    1. ssh namenode.inf.ed.ac.uk
    2. nsu hadoop
    3. /opt/hadoop/hadoop-0.20.2/bin/hadoop dfsadmin -refreshNodes
  2. The Namenode HDFS status pages should now show your machines with the status _Decommission In
Progress_ then the status Decommissioned. Once they have the status Decommissioned they are removed from HDFS.
  1. to shutdown a trasktracker node
    1. ssh
    2. nsu hadoop
    3. cd /opt/hadoop/hadoop-0.20.2/bin
    4. ==./hadoop-daemon.sh stop tasktracker

  1. When there are no jobs running, stop and start Map/Reduce:
    1. ssh jobtracker
    2. nsu hadoop
    3. cd /opt/hadoop/hadoop-0.20.2/bin
    4. ./stop-mapred.sh
    5. ./start-mapred.sh

  1. Once this has been done, edit live/list-of-spares-on-hadoop.h once
again:
    1. Remove the machine's hostname from REMOVE_MACHINES_HOSTNAMES and
SPARE_MACHINES_HOSTNAMES.
    1. Remove the machine's IP address from SPARE_MACHINES_ADDRESSES.
  1. svn commit.

When the Cluster Breaks

(This section is an aide memoire for the maintainer of the cluster.) When the cluster gets into difficulties - for instance, nodes dropping out of HDFS or Map/Reduce; errors appearing from various nodes when other nodes are processing the same job perfectly well; or other odd or unusual behaviour - here are some general approaches to try:

  • Check the namenode and map/reduce status pages on the web.
  • Login to the namenode and look for relevant error messages in the logs in /disk/scratch/hdfsdata/hadoop/logs. Today's namenode log will always be called /disk/scratch/hdfsdata/hadoop/logs/hadoop-hadoop-namenode-NODENAME.inf.ed.ac.uk.log.
  • Similarly check on the jobtracker host for the jobtracker logs.
  • Look for full disk partitions on any nodes. On Beowulf machines, check that the relationship between the /disk/scratch1 and /disk/scratch2 partitions and the /disk/scratch link is as it should be and make sure that Hadoop is using the intended partition for its /disk/scratch files (e.g. log files). On most nodes, look for big files staying around in Hadoop cache directories. These cache directories shouldn't have any long term residents.
  • To check for HDFS problems, run hadoop fsck for a short report or hadoop fsck -files -blocks -locations for a longer one.
  • When the cluster is shut down, look for stray Hadoop daemons which are still running. Most nodes should run a DataNode and a TaskTracker process while the cluster is up; these should be shut down when the cluster is shut down, but occasionally they stay in existence. This can cause problems later when they attempt to be part of the cluster when it's next started up. Look for and kill any stray Hadoop processes you find while the cluster is down.
  • Check the job tracker ( http://jobtracker.inf.ed.ac.uk:8088 ) and clear out any old jobs (in practice simply restarting the cluster should achieve this and not lose active, valid jobs).

If you've lost a disk:

  • just a summary, iain will provide more!
  • admin scripts live in /opt/hadoop/hadoop-2.7.x/sbin/
  • once a node has become inoperable it will be declared dead by the namenode after an unknown period of time; you should be able to see this in the "datanodes" page. It's updated every few minutes. HDFS will try to struggle along with reduced storage. Ideally you should be able to restart the individual node with:

$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

If the filesystem id down it may come back up in safe mode (e.g. if there are disk failures)

  • Turn safe mode off, hdfs dfsadmin -safemode leave

  • Run fsck on the filesystem hdfs dfsamdin -fsck /

hope that there's not too much data loss as with a normal fsck.

If you've lost a node

if It's just one node then ssh to the relevant node, nsu to hadoop and stop and restart the node manager with:


$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager

if it's all of the cluster then ssh onto jobtracker and run $HADOOP_PREFIX/sbin/stop-yarn.sh $HADOOP_PREFIX/sbin/start-yarn.sh

It's preferable to restart the individual nodes because if you restart the scheduler you will lose jobs. There are commands for starting/stopping all the various processes at

http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/ClusterSetup.html#Operating_the_Hadoop_Cluster

WATCH OUT, ANY COMMAND WITH "-daemons.sh" WILL AFFECT ALL THE NODES ON THE CLUSTER...... drop the s for the command just to apply to that node.

DON'TS

DON'T run start-yarn on anything other than jobtracker.inf.ed.ac.uk DON'T shut down HDFS without first shutting down yarn.

Further Reading

Edit | Attach | Print version | History: r41 | r36 < r35 < r34 < r33 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r34 - 22 Nov 2016 - 10:10:46 - RichardBell
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies