Pandemic info for Hadoop

(a work in progress)

See also

For more details see HadoopClusters.

Basic info

  • We have several Hadoop clusters.
  • They're small - only a few nodes each.
  • They run on DICE.
  • LCFG does most of the config.
  • Hadoop is needed by the Extreme Computing (EXC) module.

The Clusters

  1. The exc cluster is the real Hadoop service, users for the use of, on proper machines. To discover its hosts:
    $ profmatch exc-cluster
    
    (profmatch is in /afs/inf.ed.ac.uk/group/cos/utils)
  2. The exctest cluster uses largely the same config as the exc cluster. You can test out new config here before deploying it to the live service. You can let staff on this to test things out, but the nodes are tiny VMs so it can only run tiny test jobs. To discover its hosts:
    $ profmatch exctest-cluster
    
  3. The devel cluster is for computing staff to trash while developing new config. Never let users near it. It uses tiny VMs. To discover its hosts:
    $ profmatch devel-cluster
    

Types of node

Each cluster has an HDFS master, a YARN master and some slaves. To find out which is which, use profmatch again:
$ profmatch hdfs exc-cluster
$ profmatch yarn exctest-cluster
$ profmatch slave devel-cluster

Configuration and control

  • The LCFG hadoop component makes the configuration files.
  • The LCFG file component makes some directories and symlinks.
  • systemd controls the Hadoop processes.
  • Most of the nitty-gritty is in dice/options/hadoop-cluster-node.h .
  • You can override config using live/hadoop-cluster-node.h .

Which daemons run where

If something's not working, check whether the nodes are running the correct processes. Also check the log files. If something's wrong, the log file will generally end with a spectacular java crash message.
This node runs this using this account this systemd service It logs to here ps -ef | grep hadoop
The HDFS master name node hdfs hadoop-namenode.service /disk/scratch/hdfsdata/hadoop/logs java -Dproc_namenode ...
The YARN master resource manager yarn hadoop-resourcemanager.service /disk/scratch/yarn/logs java -Dproc_resourcemanager ...
Map Reduce job history mapred hadoop-mapred.service /disk/scratch/mapred/logs java -Dproc_historyserver ...
Each slave data node hdfs hadoop-datanode.service /disk/scratch/hdfsdata/hadoop/logs java -Dproc_datanode ...
node manager yarn hadoop-nodemanager.service /disk/scratch/yarn/logs java -Dproc_nodemanager ...
Edit | Attach | Print version | History: r9 | r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 09 May 2019 - 14:13:04 - ChrisCooke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies