Sample Real-World Use of SystemTap

SystemTap has been sometimes plagued with “solution in search of a problem” complaints. It was interesting to run across an example of SystemTap being used to solve a real-world problem. A developer discovered an OOM (Out Of Memory) condition in the upstream kernel. In the quest to obtain additional information regarding the issue, the kernel refused to boot with additional debug printk()’s added to quicklist.c. The developer, being familiar with SystemTap, used the following script:

probe kernel.statement("quicklist_trim@mm/quicklist.c:56")
{
        printf(" q->nr_pages is %d, min_pages is %d ----> %s\n",
               $q->nr_pages, $min_pages, execname());
}

The SystemTap scripting language takes a little getting used to, but the intent here is clear: each time quicklist_trim() is run in quicklist.c, a message should be printed that displays some kernel data.

This sort of usage is interesting for two reasons:

  1. Kernel developers understand that there are places in the kernel that cannot be debugged by the printk() method; kprobes (simplified via SystemTap) provide a method to extract debug data from many of those locations, and this is an example of that in action.
  2. When using SystemTap to gather debug data, the system doesn’t need to be rebooted with a new kernel. In this particular scenario, that didn’t matter so much, as the problem was discovered by a developer who could modify his own kernel and reboot the system as needed. However, if an issue is discovered on a production system, this allows for root cause analysis to proceed without the need to take the system out of production.
Advertisements

4 comments

  1. Thank you, Frank; I’d like to say that was a copy/paste error, but it was actually ignorance on my part. 🙂 I edited the article to correct the problematic line.

  2. Nice. This could be especially useful for anybody who want to debug a distro kernel, which can sometimes be hard to rebuild exactly like the one already booted.

  3. Pingback: Mike Strosaker points out a sample real world use of SystemTAP


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s