HP Proliant RAID Config for AFS Fileservers

Background

At a meeting convened to discuss future AFS storage needs, it was decided to populate the recently purchased HardwareHPDL180G6 servers with local disks in their 6 free drive bays. Local disk for AFS Fileservers seems to be the way to go to get better performance. Currently we are mostly using SAN mounted RAID 5 volumes.

The accepted wisdom is that the most performant RAID that also provides redundancy is RAID 10 (1+0) http://www.acnc.com/04_01_10.html . So it was decided to purchase 6 500GB SAS drives for each server and configure them as 1.5TB of RAID10 space for use as AFS /vicep partitions.

Last minute reconsiderations

Once the disks were bought and installed, we realised we were likely to split the 1.5TB volume into 3x500GB /vicep's. One down side of RAID 10, is that if the wrong two disks fail at the same time, you lose the whole array. So in this case, 3 /vicep's would be down while the disks were replaced. And once they were replaced you'd have to recover all the data in the array from backups.

So we contemplated configuring the 6 disks as three individual RAID1 arrays. Each array mounted as a /vicepX. This would give us the same 500GB /vicep's and the same redundancy, though we'd lose the "striping" advantage of RAID 0. But on the other hand we'd have 3 devices to deal with IO requests rather than 1, so it looked like it would all balance out. The big advantage of 3xRAID1 rather than 1xRAID10 (then split into 3) is that now if the wrong two drives fail at the same time, taking out a single mirror pair (and hence /vicep?), the other two will continue to function, only the affected /vicep data will be lost, not all 3.

The 3xRAID1 seemed to be the better choice.

Benchmarking

However we then did some simple benchmarking to see if our assumptions of performance held up. We simply used 'dbench' as it is already available as an RPM on our system.

On one machine, gorgon, we created 3xRAID1 volumes, each with a single partition containing an ext2 file system and mounted them. On the other, minotaur, we created a single RAID10 volume and partitioned it into 3 partitions. Those three partitions also had an ext2 file system created and then mounted.

Essentially the plan was to exercise all three /vicep's (on each machine) at the same time to see how the file IO coped.

3 shells were started on each machine and in each shell the command dbench -t 60 -D /vicepX dbench 4 where /vicepX was a,b and c for each of the corresponding shells.

Initially both machines showed nearly identical results, each giving around the 400MB/s in each shell, eg:

  Throughput 370.216 MB/sec 4 procs

So it looked like there wasn't any marked performance difference (in this simple test).

The test was then repeated with the synchronous flags turned on ie:

   dbench -s -S -t 60 -D /vicepa 4

The 3 RAID10 partitions averaged about 277MB/s

  Throughput 277.883 MB/sec (sync open) (sync dirs) 4 procs
  Throughput 275.956 MB/sec (sync open) (sync dirs) 4 procs
  Throughput 278.19 MB/sec (sync open) (sync dirs) 4 procs

but the 3xRAID1 volumes varied a bit more:

  Throughput 302.393 MB/sec (sync open) (sync dirs) 4 procs
  Throughput 303.256 MB/sec (sync open) (sync dirs) 4 procs
  Throughput 222.365 MB/sec (sync open) (sync dirs) 4 procs

Repeating the runs a few times showed that sometimes in one shell on the 3xRAID1 machine, dbench would stop transferring any data to the disk for chunks of time! When a process was in this state, in another interactive shell on gorgon, commands like 'mkdir' on the affected /vicep? would also hang.

Given the this behaviour, it looks like choosing the 3xRAID1 would not be such a good idea (at least on this HP hardware), despite the extra resilience it gives us against the "wrong two" disks failing.

Though this was a simple test of the performance of the two possible disk configs, it did seem to highlight a potential problem with choosing the 3xRAID1 option.

Conclusion

As both the RAID1 and RAID10 setups are redundant against a single disk failure, and can actually survive multiple disk failures (if the right disks fail), then the decision to choose one over the other comes down to performance. In this case RAID10 has the edge in our simple tests, and so we decided to stick with RAID10.

This command was used to create the RAID10

/usr/sbin/hpacucli ctrl slot=1 create type=ld drives=all raid=1+0

This assumes that "drives=all" accounts for the 6 free disks.

-- NeilBrown - 31 Mar 2011

Topic revision: r2 - 22 Aug 2011 - 11:47:02 - NeilBrown
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies