PCI error recovery: HP versus Linux on POWER

I wrote a post discussing PCI error recovery (via EEH) on Linux on POWER a few months ago [1], but I did not take the opportunity to compare it to other PCI error recovery methods at the time.  I’ve found some documentation on HP’s PCI error recovery since then, so I thought I’d post this article as a follow-on.

PCI error recovery on HP systems requires the installation of a feature called PCI Advanced Error Handling.  Notably, this feature is only available for HP-UX; recovery from PCI errors cannot be done with Linux on HP systems.  Installing the feature results in the PCI slots shifting to a “soft fail” mode. If a PCI error occurs on a slot in that soft fail mode, the slot will be frozen from performing any other I/O. However, recovery from this frozen state is not automatic; it must be effected by hand (using the olrad command; I believe OLRAD is an acronym for On-Line Repair/Add/Delete) [2].  Conversely, PCI error recovery on Linux on POWER is seamless, and requires no user intervention:  the frozen slot is detected on the next read operation, and the device is immediately reinitialized and made available for use.

Interestingly, there are two other limitations of PCI Error Handling on HP-UX.  If there is only a single path configured to storage devices, failover features like HP’s Serviceguard may not detect the loss of connectivity, which is necessary for them to perform a failover operation [3].  This is not an issue with PCI failures on Linux on POWER, because the device will be reinitialized immediately, with no need for a failover in order to wait for an administrative repair action to occur.  Secondly, if a new PCI adapter is added to the system, it will be initially set to the “hard fail” mode until it can be established that the driver is capable of handling the “soft fail” mode.  A machine check would occur if a PCI error occurred during this window, resulting in a system crash [3].  Such a gap does not exist in the Linux on POWER PCI error recovery implementation.

Hopefully I’ve been able to showcase the superior aspects of the EEH capabilities provided by the POWER platform for PCI error recovery; the fact that these capabilities are taken advantage of by both AIX and Linux makes the picture even better for POWER.

References:
[1] https://zombieprocess.wordpress.com/2007/09/16/pci-error-recovery-in-linux/
[2] http://h71028.www7.hp.com/ERC/downloads/c00767235.pdf
[3] http://docs.hp.com/en/5991-5308/index.html

Advertisements

Book Burning: It Might Be Okay This Time

Vladimir Nabokov died in 1977, having produced perhaps the greatest corpus of the 20th century. But there is one piece yet unread, one piece that enlightens only the interior of a vault in a Swiss bank. That piece, unfinished, is entitled The Original of Laura. Nabokov left his son, Dmitri Nabokov, with explicit instructions to burn the unfinished manuscript upon his death, but Dmitri has been understandably hesitant to fulfill his father’s wish.

Though my vote counts for nothing, I will cast it anyways: destroy the manuscript. Sure, like so many others, I burn with curiosity at the contents of The Original of Laura. Especially when I read the following:

And yet Dmitri had himself fueled our desire to possess Laura with some of his comments, as when he called it the “most concentrated distillation of [my father’s] creativity” and a “totally radical book.” Who would not wish to get even a sketchy glimpse of the omega point of Nabokov’s artistic evolution?

I think, though, that I offer a reason perhaps yet unconsidered for burning the manuscript: The Chilling Effect. Like so many authors, Vladimir Nabokov was paralyzed by the perception of imperfection. Realizing that the publication of a work exposes the author to public scrutiny, and possibly scorn, it is reasonable to allow an author to spend time in experimentation, frustration, and feverish reflection, with the guarantee that he or she will have the flexibility to tear apart failed experiments, eviscerate clichés, and finely machine metaphors. We have clearly moved towards a wiki/blog/forum world, with the consequential syndrome of so many words without thought, but it would be a foolhardy man indeed who would ink anything of complexity without the opportunity for rumination and self-editing. If every author wondered whether the words he is writing would be read, raw and incomplete, if he were to die just after writing them, perhaps the words would never get written.

We’ve been down this path before. Franz Kafka did not finish The Castle, and instructed his friend, Max Brod, to destroy all of his unfinished manuscripts upon his death. Instead, Brod chose to publish The Castle after heavily editing it (for acceptance by a publisher). The book ends, literally, in the middle of a sentence, which arguably works well given the themes of the book. However, I can’t help but wonder which of the words were Kafka’s and which were Brod’s.

One blogger suggests that, because Nabokov at one time was considering the destruction of the manuscript that would become Lolita, posterity would be well-served by the publication of The Original of Laura. The difference here is that Nabokov had the opportunity to recognize the flaws in what would become Lolita and decide that they were not insurmountable. Did he have such an opportunity with The Original of Laura?

Let purifying fire immolate the chilling effect. We’ve already been given more than we have the right to ask from Vladimir Nabokov; there is no need to neglect this one painfully simple wish.

Linux on POWER Odds and Ends: Service Utility Roundup

There are a number of small utilities for Linux on POWER that can come in useful for servicing are configuring your system. Here are a few utilities from the powerpc-utils-papr package that you may find useful.

The set_poweron_time utility can be used to specify a time in the future when the system or partition should be powered on, if it happens to be off at that time. For example, if a partition should be automatically started 12 hours and 10 minutes from now, run the following command: set_poweron_time -d h12m10. If the partition is off when that time expires, it will restart.

The bootlist command is used to modify the order of boot devices from the command line. Boot lists on POWER are stored as Open Firmware device names, but bootlist allows you to specify logical device names (like “sda” or “eth0”) if you choose; the ofpathname utility is used by bootlist to convert between OF device names and logical device names (between “/vdevice/v-scsi@30000002/disk@8100000000000000” and “sda”, for example).

usysident is a tool for manipulating identification LEDs. These LEDs are used to help locate FRUs (field replaceable units), to ensure that the correct part is being replaced. LEDs are specified by their location code or logical device name, and can be in one of two states: either “normal” (off) or “identify” (blinking amber LED). Run usysident without any parameters to view the available LEDs; to flash the LED on eth0: usysident -d eth0 -s identify.

A related utility is usysattn; it’s used to turn off the system attention indicator, or to view the current state of that LED. The LED usually looks like an amber exclamation point located on the operator’s panel, as in the image below from a p520.

POWER5 op panelOn a partitioned system, though, the system attention indicator will be illuminated if any of the partitions have activated it. This is because the system attention indicator determines whether any of the partitions require attention. Refer to the Service Focal Point on the HMC or IVM to determine who is asking for attention.

serv_config is a very useful utility for modifying serviceability parameters. I talked a little about it in an earlier post, so refer to that entry for more details.

The uesensor command can be used to view the values of various thermal, voltage, and fan speed sensors on the system. Unfortunately, these sensors are only exposed on POWER4 systems and some blades; more recent systems will instead send an EPOW (environmental or power warning) event if any of the sensors are in danger of shifting out of the normal operating range. EPOW events are exposed in servicelog.

All of these commands have man pages; take a look there if you need more details.

POWER Reference Codes

One nice advantage provided by POWER systems is the availability of structured and well-defined reference codes. Besides indicating errors or conditions that otherwise require attention, these codes are also used to indicate the progress of boots or dumps. If your system failed to boot for some reason, the last reference code on the operator’s panel (op panel) would provide a good clue as to what the system was doing just before the failure.

On Linux, besides appearing on the op panel, these reference codes are also found in events that are surfaced in servicelog. While servicelog contains a lot of details that are useful for servicing errors, more information can always be obtained by looking up the reference code.

There are a few kinds of reference codes; the key for decoding these refcodes is the IBM Hardware InfoCenter. I’ll briefly explain the three different types of reference codes (SRCs, SRNs, and menugoals) before showing how they are displayed in servicelog.

System Reference Codes

SRCs are sequences of alphanumeric characters (usually 8 — just enough to fit snugly on the display of the operator’s panel — but sometimes 6). They were first introduced on POWER5 systems, and exist on both System p and System i (formerly pSeries and iSeries). SRCs are documented in InfoCenter: “Service provider information”/”Reference codes”/”Using system reference codes”.

An example of an SRC used as a progress code is C7004091; that refcode indicates that the partition is in a standby state, and is waiting to be manually activated. If the partition is set to be activated automatically, the partition will not stop at this SRC, but will continue to the Open Firmware boot phase.

Linux does not generate SRCs as progress codes, but will generate some as error codes. Additionally, if you have a POWER5 or POWER6 system, events with SRCs may be written to servicelog to indicate platform-level errors.

Service Request Numbers

SRNs are an older formatting method for progress or error codes. They are generated by diagnostics in AIX, and by the firmware on POWER4 (and earlier) systems. If the progress/error code has 5 digits, or has a ‘-‘ character somewhere in it, it is an SRN. These are documented in InfoCenter: “Service provider information”/”Reference codes”/”Using service request numbers”.

As an example, the SRN 747-223 indicates that there was a “miscompare during the write/read of the memory I/O register.” Many SRNs point to a repair procedure called a MAP; in this case, the SRN points to MAP 0050, “SCSI bus problems”, which provides procedures for analyzing and repairing the problem.

Linux does not generate SRNs, but you may still see SRNs generated by older POWER platforms. They may also be generated if you boot the eServer Standalone Diagnostics CD to run device diagnostics.

Menugoals

Menugoals are reference codes that begin with a ‘#’ character. They are generated by diagnostics, and indicate procedures that can be performed by a system admin rather than by a trained service representative. Menugoals don’t typically indicate errors, but instead convey additional information about the state of the device being diagnosed. As an example, a menugoal might indicate that a tape drive requires cleaning.

Reference Codes in servicelog

Each event in servicelog has a refcode field, which will always contain a reference code (either an SRC, an SRN, or a menugoal). Here is a sample event from servicelog indicating a platform error reported by a POWER system:

PPC64 Platform Event:
Servicelog ID:      64
Event Timestamp:    Fri Dec 10 21:37:05 2004
Log Timestamp:      Wed Apr 18 00:19:12 2007
Severity:           4 (WARNING)
Version:            2
Serviceable Event:  Yes
Event Repaired:     No
Reference Code:     B125E500
Action Flags:       a800
Event Type:         224 - Platform Event
Kernel ID:          1000
Platform ID:        50929493
Creator ID:         E - Service Processor
Subsystem ID:       25 - Memory subsystem including external cache
RTAS Severity:      41 - Unrecoverable Error, bypassed with degraded performance
Event Subtype:      00 - Not applicable
Machine Type/Model: 9118-575
Machine Serial:     0SQIH47

Extended Reference Codes:
2: 030000f0  3: 28f00110  4: c13920ff  5: c1000000
6: 00811630  7: 00000001  8: 00d6000d  9: 00000000

Description:
Memory subsystem including external cache Informational (non-error) Event.
Refer to the system service documentation for more information.

<< Callout 1 >>
Priority            M
Type                16
Repair Event Key:   0
Procedure Id:       n/a
Location:           U787D.001.0481682-P2
FRU:                80P4180
Serial:             YH3016129997
CCIN:               260D

The error description provides some details concerning the failure, and the FRU callout indicates which part to repair in order to fix the problem. The refcode field contains an SRC, B125E500; looking that SRC up in InfoCenter shows the following details:

  • B1 indicates it was reported by the service processor
  • 25 indicates that it is an “external cache event or error reported by the service processor”
  • E500 indicates that it is a result of processor runtime diagnostics (PRD)

In addition to that, the InfoCenter entry for B125E500 indicates that this event is the result of a hardware failure. The FRU callout indicates which piece of hardware should be replaced to resolve the error.