I've been trying to really figure out what my IOPS are on my DB server array and see if it's just too much. The array is four 72.6gb 15k rpm drives in RAID 5. To calculate IOPS for RAID 5 the following formula is used: (reads + (4 * Writes)) / Number of disks = total IOPS
. The formula is from MSDN. I also want to calculate the Avg Queue Length but I'm not sure where they are getting the formula from, but i think it reads on that page as avg que length/number of disks = actual queue
.
To populate that formula I used the perfmon to gather the needed information. I came up with this, under normal production load: (873.982 + (4 * 28.999)) / 4 = 247.495
. Also the disk queue lengh of 14.454/4 = 3.614
.
So to the question, am I wrong in thinking this array has a very high disk IO?
Edit
I got the chance to review it again this morning under normal/high load. This time with even bigger numbers and IOPS in excess of 600 for about 5 minutes then it died down again. But I also took a look at the Avg sec/Transfer
, %Disk Time
, and %Idle Time
. These number were taken when the reads/writes per sec were only 332.997/17.999 respectively.
%Disk Time: 219.436
%Idle Time: 0.300
Avg Disk Queue Length: 2.194
Avg Disk sec/Transfer: 0.006
Pages/sec: 2927.802
% Processor Time: 21.877
Edit (again)
Looks like I have that issue solved. Thanks for the help. Also for a pretty slick parser I found this: http://pal.codeplex.com/ It works pretty well for breaking down the data into something usable.
-
I've been doing a bit of research in this area lately myself. There's a great calculator here that people on this site have pointed me at. Throwing some basic numbers into it with a 97:3 read to write ratio and not factoring in cache hits, it looks like your array should be able to mechanically deliver about 675 IOPS at 4k. You're doing 902 transfers per second which would be high, as your queue length per disk also indicates.
You might also measure sec/Transfer and %Disk time. I've found %Disk time to be rather odd with RAID arrays, and I've found it more accurate to measure %Idle time and then use the formula 100 - %Idle to calculate busy time. I bet you'll find that your disk is very busy and you'll see a lot of 20ms+ transfer times. IMO these numbers are more clear cut if you want to conclude that you've got a disk problem.
Here's a great article that gets referenced quite a bit. It's dated but relevant, and is especially useful for explaining why %Disk time and Avg. Queue Length can be difficult to interpret.
Tim Meers : I've taken a look at that site before as well, it's nice, but appears to use higher spec'd drives as defaults in the dropdowns. This is an older drive array, not OLD by anymeans but not up to some of the cache levels of todays uuber fast drives. When I get soem time later or monday I'll take a look at the `sec/Transfer` and `%Disk time` and see if I can nail it down a little more. I'll be sure to update the question with my findings.Boden : You don't have to use the example disks in wmarow's calculator. You can grab the necessary parameters from the spec sheet for your drive.Tim Meers : That's part of the issue too, I don't have all the specs. I talked with my host, and they didn't have all the details. Or maybe they did and just didn't want to share. I'm just trying to build a case for hardware upgrades, I mean it is the new year after all ;)Boden : That's ok. Assume it's something like an old U320 15K 3.5" drive. It should have somewhere between 3.5 and 4ms seek latency. Grab the specs from something like a ST373454LC or just use 3.5ms to make a conservative estimate. Even if you assume a very fast 3.3, you're not going to get 900 IOPS. Regardless, looking at busy time, sec/Transfer at the 90th or 95th percentile, and your disk queue will make the argument for you. Determining the capability of the disks themselves will just help you explain why there might be a problem.Tim Meers : You're right looking at the `%Disk time` is a little odd, thanks for the pointers on what to look at. I've updated the questions with this mornings stats.Boden : Capture your performance stats in csv format over the course of a few hours or an entire business day. Open em up in Excel, and then calculate the 95th percentile for sec/Transfer. I'm guessing you'll see a pretty high number. You don't want to look at the average for this number, but how high it gets when things are busy. I believe anything over 20ms is cause for concern, and multi-second spikes over 50ms are pretty bad.From Boden
0 comments:
Post a Comment