Basic Admin Commands

Generally GPFS is fairly reliable and the only real failure mode is if one or more of the disks have hardware problems, this will fail the disk and possibly lock up the filesystem pending a fix. This might be a matter of removing the failing disk from the filesystem. If the filesystem can be brought back up temporarily then it ought to be possiblt to remove the disk cleanly without losing any data (assuming there is enough free space on the now reduced filesystem).

Generally you might expect support to take a shot at fixing things by unmounting the filesystem and remounting, it's not expected that they do much more than this. Authenticating as the administrator

Most of the gpfs mm commands require you to be logged in as root and able to ssh passwordlessly to all the gpfs nodes. Log into one of the cluster's admin hosts (currently just the scheduler host:- illustrious) nsu to root and initialise the gpfs environment with initgpfs

[illustrious]iainr: nsu [illustrious]root: initgpfs Agent pid 31424 Enter passphrase for /root/.ssh/id_rsa: Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa) [illustrious]root:

This sets up passwordless RSA based authentication throughout the GPFS cluster. Provided you use the -o.inf.ed.ac.uk based addressing you will be able to ssh or run commands on all nodes. Cluster Status

Basic information about the cluster can be found using the mmlsclustercommand which will return information about the cluster itself and the individual nodes. e.g.

[illustrious]root: mmlscluster

GPFS cluster information ==================== GPFS cluster name: illustrious-o.inf.ed.ac.uk GPFS cluster id: 9355967080204467299 GPFS UID domain: illustrious-o.inf.ed.ac.uk Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp

GPFS cluster configuration servers:


Primary server: illustrious-o.inf.ed.ac.uk Secondary server: sannox-o.inf.ed.ac.uk

Node Daemon node name IP address Admin node name Designation


  1. illustrious-o.inf.ed.ac.uk 129.215.18.125 illustrious-o.inf.ed.ac.uk quorum-manager 22 bw1425n17-o.inf.ed.ac.uk 129.215.18.56 bw1425n17-o.inf.ed.ac.uk 23 bw1425n18-o.inf.ed.ac.uk 129.215.18.57 bw1425n18-o.inf.ed.ac.uk 24 bw1425n19-o.inf.ed.ac.uk 129.215.18.58 bw1425n19-o.inf.ed.ac.uk 25 bw1425n20-o.inf.ed.ac.uk 129.215.18.59 bw1425n20-o.inf.ed.ac.uk 26 bw1425n21-o.inf.ed.ac.uk 129.215.18.60 bw1425n21-o.inf.ed.ac.uk ... 115 hcrc1425n32-o.inf.ed.ac.uk 129.215.18.32 hcrc1425n32-o.inf.ed.ac.uk 116 hcrc1425n33-o.inf.ed.ac.uk 129.215.18.33 hcrc1425n33-o.inf.ed.ac.uk 117 hcrc1425n34-o.inf.ed.ac.uk 129.215.18.34 hcrc1425n34-o.inf.ed.ac.uk 118 sannox-o.inf.ed.ac.uk 129.215.18.126 sannox-o.inf.ed.ac.uk quorum-manager

unmounting the filesystem

This is a variant on the standard umount command.

[illustrious]root: mmumount /gpfs Wed Apr 28 14:19:59 BST 2010: mmumount: Unmounting file systems ...

all the cluster nodes should have fstab entries for the filesystem. running mmumount -a /gpfs would unmount the filesystem on all nodes in the cluster. mounting the filesystem

Again much like the standard unix command

[illustrious]root: mmmount /gpfs Wed Apr 28 14:20:04 BST 2010: mmmount: Mounting file systems ...

Shutting the filesystem down

You can use the mmshutdown command to unmount and shutdown the filesystem on one or more of the cluster nodes. The standard options are:

mmshutdown shut down the filesystem on the local machine mmshutdown -a shutdown the filesystem on all the cluster nodes. mmshutdown -N Node[,Node...] | NodeFile shutdown the filesystem on the listed nodes (or the nodes contained in the file

Starting the filesystem

The filesystem can be started on one or more of the cluster nodes by running mmstartup. The standard optiosn are

mmstartup start down the filesystem on the local machine mmstartup -a start up the filesystem on all the cluster nodes. mmstartup -N Node[,Node...] | NodeFile startup the filesystem on the listed nodes (or the nodes contained in the file

listing Network shared disks

To list the network shared disk in the cluster and see which filesystem they are associated with run mmlsnsd again you'll need to have run the initgpfs command to set up access to the cluster.

[illustrious]root: mmlsnsd

File system Disk name NSD servers


gpfsdev gpfs69nsd bw650n01-o.inf.ed.ac.uk gpfsdev gpfs71nsd klemperer-o.inf.ed.ac.uk gpfsdev gpfs72nsd haitink-o.inf.ed.ac.uk (free disk) gpfs73nsd toscanini-o.inf.ed.ac.uk

more information can be provided bi using the -m and -X flags

i.e.

[illustrious]root: mmlsnsd -m

Disk name NSD volume ID Device Node name Remarks


gpfs69nsd 81D712444B154D31 /dev/hdc bw650n01-o.inf.ed.ac.uk server node gpfs71nsd 81D712464B154D34 /dev/hdc klemperer-o.inf.ed.ac.uk server node gpfs72nsd 81D712494B154D34 /dev/sdb haitink-o.inf.ed.ac.uk server node gpfs73nsd 81D712454B154DEC /dev/hdc toscanini-o.inf.ed.ac.uk server node

[illustrious]root: mmlsnsd -X

Disk name NSD volume ID Device Devtype Node name Remarks


gpfs69nsd 81D712444B154D31 /dev/hdc generic bw650n01-o.inf.ed.ac.uk server node gpfs71nsd 81D712464B154D34 /dev/hdc generic klemperer-o.inf.ed.ac.uk server node gpfs72nsd 81D712494B154D34 /dev/sdb generic haitink-o.inf.ed.ac.uk server node gpfs73nsd 81D712454B154DEC /dev/hdc generic toscanini-o.inf.ed.ac.uk server node

Adding disks to the filesystem

It is possible to add network shared disks to an existing filesystem whilst the filesystem is still being used.

mmadddisk gpfsdev gpfs73nsd:::descOnly:4119::

The following disks of gpfsdev will be formatted on node illustrious.inf.ed.ac.uk: gpfs73nsd: size 244198584 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' Warning: No xauth data; using fake authentication data for X11 forwarding. 66 % complete on Tue Apr 27 15:20:20 2010 100 % complete on Tue Apr 27 15:20:22 2010 Completed adding disks to file system gpfsdev. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.

The disk description format is of the form DiskName :::DiskUsage:FailureGroup::StoragePool: where DiskUsage is one of

dataAndMetadata Indicates that the disk contains both data and metadata. This is the default for disks in the system pool. dataOnly Indicates that the disk contains data and does not contain metadata. metadataOnly Indicates that the disk contains metadata and does not contain data. descOnly Indicates that the disk contains no data and no file metadata. Such a disk is used solely to keep a copy of the file system descriptor, and can be used as a third failure group in certain disaster recovery configurations. For more information, see General Parallel File System: Advanced Administration and search on Synchronous mirroring utilizing GPFS replication.

-- IainRae - 17 Jul 2014

Edit | Attach | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 17 Jul 2014 - 14:24:45 - IainRae
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies