Wednesday, January 26, 2011

How do I tell which processes are writing heavily to disk in CentOS 5?

Our server started getting slow, so I ran iostat on it.

iostat -dx 5

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    89.60 108.40  5.60   880.00   763.20    14.41     2.61   22.87   8.70  99.20
sdb               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

So I see that the one disk sda is totally saturated. How do I find which exact processes are causing this? (or is it swapping to that disk?)

  • collectl may be what you are after. I/O statistics by process, among other things. collectl --top io to print a top-like listing sorted by IO usage, collectl -sZ for collectl native output for the processes subsystem. Adding the --procopts t switch will show threads too.

    As Richard Salts mentioned, IOTop will give you a UI with more detailed I/O stats, if you have a window manager and Python then use that. In either case though, if your kernel doesn't support it (2.6.20 or later is a safe bet) then neither program will work.

    Artem : As you anticipated, sadly, I get "Error: you cannot use --top and IO options with this kernel type 'collectl -h' for help" I am on CentOS 5. What are my options in this case?
    From darvids0n
  • Would be nice to know what distro you're on, but here goes:
    You can see what disk your swap partition is on by checking for " Linux swap / Solaris" in the output of "fdisk -l /dev/sda". That will show you if there is swap on that partition.

    Then, you can watch swap usage with vmstat to see if your server is doing a lot of swapping.

    Artem : I am on CentOS 5.
    Sweet : Ha, oops. that's in the subject, my bad. sysstat is in the repos and includes the "iostat" tool that others have referred to.
    From Sweet
  • I also like iotop

  • So sadly none of the iostat and related packages work in CentOS 5. But I was able to find the culprit slow process by using:

    ps auxf | grep ' B'

    Which shows all the processes waiting in uniterruptible sleep caused by I/O waiting, so it is likely to be processes doing a lot of I/O.

    This was thanks to this ServerFault answer: http://serverfault.com/questions/155882/wa-waiting-for-i-o-from-top-command-is-big

    Also, for those wondering if the I/O is slow because of swapping, take a look at your top output and see what the sum of (free + cached) columns says. Or better use htop, which shows this in a less confusing way.

    From Artem
  • one option that might work for you is if the disk is only getting saturated in bursts, use collectl to grab disk and process stats. then look at the data to see when the disk is being saturated and 'collectl -sZ -p filename' to playback your collected process data and look at which processes are in the RUN state during these times. might work, might not... -mark

    From Mark Seger
  • Try command btrace (or blktrace)

0 comments:

Post a Comment