And so, with introductions out of the way, I thought I’d just leap in and explain a little bit about PCI error recovery in Linux, and how it works on System p and System i (formerly known as pSeries and iSeries).
IBM’s POWER-based systems have a feature called EEH (extended I/O error handling). Each PCI bridge has a bit of hardware associated with it that detects abnormal conditions, like parity errors and wild DMA accesses. When such an error is detected, the misbehaving device’s bus is frozen, meaning that the device becomes inaccessible by the operating system. Any writes to the device will be dropped, and any reads will result in all 1’s (0xFFF…). By itself, that’s only marginally useful; at least the partition doesn’t crash as a result of the error and, probably more importantly, at least no data corruption occurs due to the fault. The real advantage to EEH is that the system firmware provides interfaces that can be called to reset the bus and bring the device back online, without the need to restart the operating system.
In Linux, any time a read results in all 1’s, a firmware invocation is made by the kernel to determine if the device actually meant to return that value or if the bus was frozen due to a detected failure. (Empirical measurements show that devices almost never mean to return all 1’s, so there are not many false positives.) If the bus is indeed frozen, notification and recovery routines in the device driver are automatically invoked by the kernel to indicate that the device has experienced an error and to indicate when the bus is unfrozen (so that the driver can re-initialize the device).
If the device driver is not “PCI error recovery enabled” (i.e. does not provide routines to perform recovery), the kernel will attempt to perform a hotplug operation. The hotplug operation is typically successful, but recovery is much slower than it would be if the device driver were PCI error recovery enabled.
Yes, all of the support for PCI error recovery in Linux is upstream at this time. One of my co-workers, Linas Vepstas, wrote the lion’s share of the code. Several device drivers are instrumented with recovery routines, including e1000, ixgb, and ipr; those drivers serve as good examples so that others can also be instrumented. For more details, see Documentation/pci-error-recovery.txt in the kernel source.