Basic Admin Commands
Generally GPFS is fairly reliable and the only real failure mode is if one or more of the disks have hardware problems, this will fail the disk and possibly lock up the filesystem pending a fix. This might be a matter of removing the failing disk from the filesystem. If the filesystem can be brought back up temporarily then it ought to be possiblt to remove the disk cleanly without losing any data (assuming there is enough free space on the now reduced filesystem).
Generally you might expect support to take a shot at fixing things by unmounting the filesystem and remounting, it's not expected that they do much more than this.
Authenticating as the administrator
Most of the gpfs mm commands require you to be logged in as root and able to ssh passwordlessly to all the gpfs nodes. Log into one of the cluster's admin hosts (currently just the scheduler host:- illustrious) nsu to root and initialise the gpfs environment with initgpfs
[illustrious]iainr: nsu
[illustrious]root: initgpfs
Agent pid 31424
Enter passphrase for /root/.ssh/id_rsa:
Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
[illustrious]root:
This sets up passwordless RSA based authentication throughout the GPFS cluster. Provided you use the
-o.inf.ed.ac.uk based addressing you will be able to ssh or run commands on all nodes.
Cluster Status
Basic information about the cluster can be found using the mmlsclustercommand which will return information about the cluster itself and the individual nodes. e.g.
[illustrious]root: mmlscluster
GPFS cluster information
====================
GPFS cluster name: illustrious-o.inf.ed.ac.uk
GPFS cluster id: 9355967080204467299
GPFS UID domain: illustrious-o.inf.ed.ac.uk
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
GPFS cluster configuration servers:
Primary server: illustrious-o.inf.ed.ac.uk
Secondary server: sannox-o.inf.ed.ac.uk
Node Daemon node name IP address Admin node name Designation
- illustrious-o.inf.ed.ac.uk 129.215.18.125 illustrious-o.inf.ed.ac.uk quorum-manager 22 bw1425n17-o.inf.ed.ac.uk 129.215.18.56 bw1425n17-o.inf.ed.ac.uk 23 bw1425n18-o.inf.ed.ac.uk 129.215.18.57 bw1425n18-o.inf.ed.ac.uk 24 bw1425n19-o.inf.ed.ac.uk 129.215.18.58 bw1425n19-o.inf.ed.ac.uk 25 bw1425n20-o.inf.ed.ac.uk 129.215.18.59 bw1425n20-o.inf.ed.ac.uk 26 bw1425n21-o.inf.ed.ac.uk 129.215.18.60 bw1425n21-o.inf.ed.ac.uk ... 115 hcrc1425n32-o.inf.ed.ac.uk 129.215.18.32 hcrc1425n32-o.inf.ed.ac.uk 116 hcrc1425n33-o.inf.ed.ac.uk 129.215.18.33 hcrc1425n33-o.inf.ed.ac.uk 117 hcrc1425n34-o.inf.ed.ac.uk 129.215.18.34 hcrc1425n34-o.inf.ed.ac.uk 118 sannox-o.inf.ed.ac.uk 129.215.18.126 sannox-o.inf.ed.ac.uk quorum-manager
unmounting the filesystem
This is a variant on the standard umount command.
[illustrious]root: mmumount /gpfs
Wed Apr 28 14:19:59 BST 2010: mmumount: Unmounting file systems ...
all the cluster nodes should have fstab entries for the filesystem. running mmumount -a /gpfs would unmount the filesystem on all nodes in the cluster.
mounting the filesystem
Again much like the standard unix command
[illustrious]root: mmmount /gpfs
Wed Apr 28 14:20:04 BST 2010: mmmount: Mounting file systems ...
Shutting the filesystem down
You can use the mmshutdown command to unmount and shutdown the filesystem on one or more of the cluster nodes. The standard options are:
mmshutdown
shut down the filesystem on the local machine
mmshutdown -a
shutdown the filesystem on all the cluster nodes.
mmshutdown -N Node[,Node...] | NodeFile
shutdown the filesystem on the listed nodes (or the nodes contained in the file
Starting the filesystem
The filesystem can be started on one or more of the cluster nodes by running mmstartup. The standard optiosn are
mmstartup
start down the filesystem on the local machine
mmstartup -a
start up the filesystem on all the cluster nodes.
mmstartup -N Node[,Node...] | NodeFile
startup the filesystem on the listed nodes (or the nodes contained in the file
listing Network shared disks
To list the network shared disk in the cluster and see which filesystem they are associated with run mmlsnsd again you'll need to have run the initgpfs command to set up access to the cluster.
[illustrious]root: mmlsnsd
File system Disk name NSD servers
gpfsdev gpfs69nsd bw650n01-o.inf.ed.ac.uk
gpfsdev gpfs71nsd klemperer-o.inf.ed.ac.uk
gpfsdev gpfs72nsd haitink-o.inf.ed.ac.uk
(free disk) gpfs73nsd toscanini-o.inf.ed.ac.uk
more information can be provided bi using the -m and -X flags
i.e.
[illustrious]root: mmlsnsd -m
Disk name NSD volume ID Device Node name Remarks
gpfs69nsd 81D712444B154D31 /dev/hdc bw650n01-o.inf.ed.ac.uk server node
gpfs71nsd 81D712464B154D34 /dev/hdc klemperer-o.inf.ed.ac.uk server node
gpfs72nsd 81D712494B154D34 /dev/sdb haitink-o.inf.ed.ac.uk server node
gpfs73nsd 81D712454B154DEC /dev/hdc toscanini-o.inf.ed.ac.uk server node
[illustrious]root: mmlsnsd -X
Disk name NSD volume ID Device Devtype Node name Remarks
gpfs69nsd 81D712444B154D31 /dev/hdc generic bw650n01-o.inf.ed.ac.uk server node
gpfs71nsd 81D712464B154D34 /dev/hdc generic klemperer-o.inf.ed.ac.uk server node
gpfs72nsd 81D712494B154D34 /dev/sdb generic haitink-o.inf.ed.ac.uk server node
gpfs73nsd 81D712454B154DEC /dev/hdc generic toscanini-o.inf.ed.ac.uk server node
Adding disks to the filesystem
It is possible to add network shared disks to an existing filesystem whilst the filesystem is still being used.
mmadddisk gpfsdev gpfs73nsd:::descOnly:4119::
The following disks of gpfsdev will be formatted on node illustrious.inf.ed.ac.uk:
gpfs73nsd: size 244198584 KB
Extending Allocation Map
Checking Allocation Map for storage pool 'system'
Warning: No xauth data; using fake authentication data for X11 forwarding.
66 % complete on Tue Apr 27 15:20:20 2010
100 % complete on Tue Apr 27 15:20:22 2010
Completed adding disks to file system gpfsdev.
mmadddisk: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
The disk description format is of the form DiskName :::DiskUsage:FailureGroup::StoragePool: where DiskUsage is one of
dataAndMetadata
Indicates that the disk contains both data and metadata. This is the default for disks in the system pool.
dataOnly
Indicates that the disk contains data and does not contain metadata.
metadataOnly
Indicates that the disk contains metadata and does not contain data.
descOnly
Indicates that the disk contains no data and no file metadata. Such a disk is used solely to keep a copy of the file system descriptor, and can be used as a third failure group in certain disaster recovery configurations. For more information, see General Parallel File System: Advanced Administration and search on Synchronous mirroring utilizing GPFS replication.
-- IainRae - 17 Jul 2014
Topic revision: r1 - 17 Jul 2014 - 14:24:45 -
IainRae