SystemTap has been sometimes plagued with “solution in search of a problem” complaints. It was interesting to run across an example of SystemTap being used to solve a real-world problem. A developer discovered an OOM (Out Of Memory) condition in the upstream kernel. In the quest to obtain additional information regarding the issue, the kernel refused to boot with additional debug printk()’s added to quicklist.c. The developer, being familiar with SystemTap, used the following script:
probe kernel.statement("quicklist_trim@mm/quicklist.c:56")
{
printf(" q->nr_pages is %d, min_pages is %d ----> %s\n",
$q->nr_pages, $min_pages, execname());
}
The SystemTap scripting language takes a little getting used to, but the intent here is clear: each time quicklist_trim() is run in quicklist.c, a message should be printed that displays some kernel data.
This sort of usage is interesting for two reasons:
- Kernel developers understand that there are places in the kernel that cannot be debugged by the printk() method; kprobes (simplified via SystemTap) provide a method to extract debug data from many of those locations, and this is an example of that in action.
- When using SystemTap to gather debug data, the system doesn’t need to be rebooted with a new kernel. In this particular scenario, that didn’t matter so much, as the problem was discovered by a developer who could modify his own kernel and reboot the system as needed. However, if an issue is discovered on a production system, this allows for root cause analysis to proceed without the need to take the system out of production.
Friday, 4 Jan 2008 at 12:08 am
(The penultimate line should be:)
… $q->nr_pages, $min_pages, execname() …
Friday, 4 Jan 2008 at 12:04 pm
Thank you, Frank; I’d like to say that was a copy/paste error, but it was actually ignorance on my part.
I edited the article to correct the problematic line.
Friday, 4 Jan 2008 at 4:44 pm
Nice. This could be especially useful for anybody who want to debug a distro kernel, which can sometimes be hard to rebuild exactly like the one already booted.
Monday, 7 Jan 2008 at 9:50 am
[...] noticed an interesting blog post from IBMer, Mike Strosaker over on his blog Zombie Process regarding SystemTAP. Check out his post for all the [...]