Pandemic info for Hadoop
This is a quick and dirty pandemic guide, presenting a greatly abbreviated version of the information in
HadoopClusters. If this page doesn't tell you what you need to know, read
HadoopClusters.
Basic info
- We have Hadoop clusters.
- They're small - only a few nodes each.
- They use DICE.
- LCFG does most of the config.
- Hadoop is needed by the Extreme Computing (EXC) module.
- Hadoop is a distributed framework for processing big data.
The Clusters
There are three clusters. They all use the same Hadoop configuration headers.
1. The exc cluster
This is our one real Hadoop service, users for the use of. It's on physical servers in Appleton Tower. To list its nodes:
profmatch hadoop-exc-cluster
To find the power and network connections:
for i in `profmatch hadoop-exc-cluster`; do rfe -xf apdu/$i; done
for i in `profmatch hadoop-exc-cluster`; do rfe -xf atnet/$i; done
To shut it down:
for i in `profmatch hadoop-exc-cluster`; do echo Shutting down $i; ssh $i nsu -c poweroff ; done
2. The exctest cluster
This is for testing out new config before deploying it to the live service. You can let staff on this to test things out, but bear in mind that the nodes are tiny VMs so it can only run tiny test jobs. To list its nodes:
profmatch hadoop-exctest-cluster
To find each VM's KVM host (2 ways):
for i in `profmatch hadoop-exctest-cluster`; do kvmtool --name $i locate; done
for i in `profmatch hadoop-exctest-cluster`; do ii query --host $i --detail | grep host; done
To find each VM's physical site:
for i in `profmatch hadoop-exctest-cluster`; do ii query --host $i; done
To shut it down:
for i in `profmatch hadoop-exctest-cluster`; do echo Shutting down $i; kvmtool --name $i shutdown ; done
3. The devel cluster
This is for computing staff to trash and rebuild as necessary. Never let users near it. Its nodes are more tiny VMs. To list them:
profmatch hadoop-devel-cluster
To find each VM's KVM host (2 ways):
for i in `profmatch hadoop-devel-cluster`; do kvmtool --name $i locate; done
for i in `profmatch hadoop-devel-cluster`; do ii query --host $i --detail | grep host; done
To find each VM's physical site:
for i in `profmatch hadoop-devel-cluster`; do ii query --host $i; done
To shut it down:
for i in `profmatch hadoop-devel-cluster`; do echo Shutting down $i; kvmtool --name $i shutdown ; done
(
profmatch
is in
/afs/inf.ed.ac.uk/group/cos/utils
)
Types of node
Each cluster has an HDFS master, a YARN master and some slaves. To find out which is which, use
profmatch again:
profmatch hdfs hadoop-exc-cluster
profmatch yarn hadoop-exctest-cluster
profmatch slave hadoop-devel-cluster
Configuration and control
- The LCFG
hadoop
component makes the configuration files.
- The LCFG
file
component makes some directories and symlinks.
-
systemd
controls the Hadoop processes.
- Most of the nitty-gritty is in
dice/options/hadoop-cluster-node.h
.
- You can override config using
live/hadoop-cluster-node.h
.
Trouble?
If something's not working:
This node |
runs this |
using this account |
this systemd service |
It logs to here |
ps -ef | grep hadoop |
The HDFS master |
name node |
hdfs |
hadoop-namenode.service |
/disk/scratch/hdfsdata/hadoop/logs |
java -Dproc_namenode ... |
The YARN master |
resource manager |
yarn |
hadoop-resourcemanager.service |
/disk/scratch/yarn/logs |
java -Dproc_resourcemanager ... |
Map Reduce job history |
mapred |
hadoop-mapred.service |
/disk/scratch/mapred/logs |
java -Dproc_historyserver ... |
Each slave |
data node |
hdfs |
hadoop-datanode.service |
/disk/scratch/hdfsdata/hadoop/logs |
java -Dproc_datanode ... |
node manager |
yarn |
hadoop-nodemanager.service |
/disk/scratch/yarn/logs |
java -Dproc_nodemanager ... |
Further reading