Wrangling NFS Load with SystemTap
About a month ago, we experienced a peculiar spike in NFS operations on a subsystem of our clusters with the majority of these NFS operations being NFSD getattr requests. We’ve used various tools and methods in the past to troubleshoot our NFS issues but the tool that gave us exactly what we wanted in this case was Systemtap.
In an environment where requests are coming from multiple clients and thousands of users, it’s hard to pinpoint what specifically is causing all these getattrs but they all share a common denominator: a small number of NFS servers. The graph below is a good illustration of our high getattrs, and then the drop off after we were able to identify and fix the issue.
Systemtap is similar to DTrace, but it’s for Linux. The beauty of Systemtap is being able to pull data and massage it when our target kernel functions are called. Getting it installed and working turned out not to be so straight forward in Debian Lenny. Here are a few things to look out for when setting up Systemtap on Debian Lenny:
- SystemTap is dependent on having a kernel with debugging symbols, as well as modules with debugging symbols as well as sources to build modules. Debian Lenny doesn’t provide a -dbg kernel package so I pulled one from lenny-backports.
- The -dbg kernel package installs the modules with debugging symbols with the wrong names and path. See step 4 of the Systemtap On Ubuntu wiki article
- The version of Systemtap that ships with Debian Lenny is from the stone age and when I tried to run Systemtap scripts from their shipped version of Systemtap there were all kinds of problems.
- With Lenny and the backports kernel, there are issues if you try to use the latest version of systemtap, I tried a bunch of different versions and systemtap 1.2 is the latest version that works for us. However building systemtap 1.2, we need a newer version of elfutils than what comes with Lenny… however the latest version of elfutils doesn’t work with systemtap 1.2 so we regress down to elfutils 0.151. Below are the steps I used to build systemtap:
Once we have Systemtap built, we need to write a Systemtap script to get the data we want. There are plenty of examples available to identify files and client IPs from various NFS operations, but I was hard pressed to find one that pulled this info from getattrs requests. Everything I could find would really only return NFS filehandles which are not easy to translate into file names with Linux’s NFSD implementation. Using NFSD debug logging wasn’t very fruitful either – it too only gave us nfs filehandles. On a system we tried it, nfsd debug caused the box to hang which doesn’t help us at all. Here’s an example of the kind of info we get from NFSD debug:
Even with Systemtap running handlers just on nfsd3_proc_getattr, all I could really get was the NFS filehandle again. So I used a combination of global variables, thread ids and functions I knew would be called during an nfs3 getattr request to pull the data I wanted. Below is the script I used to pull the file names and client IPs from the getattr requests hitting the NFS servers. Piping the output through sort and uniq gave me the top getattr requests which we used to track down the root cause of these high getattrs.
A brief explanation of what we’re doing here: we know that a nfs getattr would require a vfs getattr so we utilize the ‘vfs_getattr’ kernel function to supply us with the mount point, parent path and file name. We get the client IP from ‘nfsd.dispatch’ which is a Systemtap probe that triggers whenever any NFSD operation occurs. And then we tie it all together with the ‘nfsd3_proc_getattr’ function from the NFSD module which we use ‘nfsd_getattr_hit’ global array we defined to mark that thread id so that the ‘vfs_getattr’ function knows to only print out the file name if the getattr came the same thread that hit ‘nfsd3_proc_getattr’.Below is some sample output of the script in action:
Systemtap proved to be vital for us in this instance, we hope that in the future someone looking to pull file names and client IPs from NFSD getattrs will be able to use this information as a resource.
Special thanks to author, Michael Hsu, Sr. Systems Engineer at (mt) Media Temple