System Firmware on pSeries and System p

Most PowerPC-based systems since POWER4 (including blades (JS20 and later)) have dual firmware banks. The intent is to allow an administrator to update the system’s firmware in one bank while preserving the previous firmware image in the other bank. Should there be a problem with the firmware update process, or an issue with the new image for any reason, the administrator can revert to the older, known-safe image.

Updating Firmware with Dual BanksDuring normal operation, the system is booted off of the “temporary” bank (sometimes called the temp side or t-side), and the contents of the temporary side are the same as that of the “permanent” bank (a.k.a. the perm side or p-side).

When a firmware update occurs, the new image is copied into the t-side. If the t-side image is different than the p-side image, the t-side will be copied to the p-side before the t-side is overwritten (i.e. the current production image will be backed up to the permanent bank). The system will attempt to reboot to the new image on the t-side. If the boot succeeds, and the new image works well, it can be “committed” to the p-side. If the system does not boot on the t-side (due to a corrupted image, for example), it will automatically boot onto the p-side. At that point, the t-side image can be “rejected” by overwriting it with the known-safe p-side image. The system should then be booted off of the t-side

Given those properties, it has always seemed to me that the sides are misnamed; I find it useful to think of the temporary side as the production side, and the permanent side as the backup side.

  • temporary == production
  • permanent == backup (older, known-safe image)

Viewing the Current Firmware Levels

The current firmware levels can be viewed by running /usr/sbin/lsmcode -A (lsmcode is part of the lsvpd open source package):

# lsmcode -A
sys0!system:SF240_320 (t) SF220_051 (p) SF240_320 (t)service:

The above output indicates that the temporary bank contains the firmware level SF240_320, and the permanent bank contains SF220_051. The third entry by the sys0!system: tag indicates that the system is currently booted off of the temporary side. Before a new firmware update operation can be attempted, the t-side image should be committed, overwriting the older p-side image.

I wrote about the serv_config command-line utility in my previous post; this command is the easiest method to determine whether your system is currently operating off of the temporary or permanent side. Run /usr/sbin/serv_config -e sp-current-flash-image. If it prints 0, the p-side is booted; if it prints 1, the t-side is booted. Both the lsmcode and serv_config commands will work on any Linux partition on the system.

Managing System Firmware

The update_flash command (part of the powerpc-utils-papr open source package) provides the ability to manage system firmware from the Linux command-line on POWER systems. There are restrictions as to which partitions can be used to update system firmware:

  • If the system is managed by an HMC, firmware should be updated from there (via the Licensed Internal Code management screens). In some cases, firmware updates performed via the HMC will be concurrent (meaning that the system and the partitions do not need to be restarted in order to recognize and begin using the updated firmware level).
  • If the system is partitioned, only the partition that has been granted service authority can perform firmware updates.
  • If the system is not partitioned, and not HMC managed, the update_flash command is the only method for updating system firmware.

New firmware images can be downloaded from http://techsupport.services.ibm.com/server/mdownload/. The following operations can be performed with the update_flash command:

  • Validate that the image stored in a file appears to be uncorrupted: update_flash -v -f <filename>
  • Perform an update with the image stored in a file: update_flash -f <filename>
  • Commit the t-side to the p-side (when it has been determined that the production image is safe): update_flash -c
  • Reject the current image on the t-side (overwrite it with the image in the p-side, because the t-side image is unsafe): update_flash -r

The first three commands should only be used when the system is booted on the t-side; the last can only be used when the system is booted on the p-side.

Advertisements

HMCs and system parameters

Determining the HMC(s) that Manage a Partition

A Linux on POWER question I frequently hear: How can I find out the hostname or IP address of the HMC that is managing the partition I’m currently using?

A partition can be managed by up to 16 HMCs at a time (I usually only see 1 or 2 managing a particular system, but the option for more is there). The HMC connection information is stored by the platform, and exposed to the partitions as system parameters. These system parameters can be viewed (and sometimes modified) by using the serv_config command (serv_config is “serviceability configuration”, not “server configuration” or some such). As root, run:

/usr/sbin/serv_config -e hmc0

through

/usr/sbin/serv_config -e hmc15

to check each of the possible HMC slots. Many of the slots will be empty, returning no information. However, each slot which currently contains HMC connection information will return a string like the following:

HmcStat=1;HscName=eserver xSeries 335 -[7310CR2]-*105FB7A;
HscHostName=hmc-hostname.ibm.com;HscIPAddr=1.1.1.1;HscAddIPs=1.1.1.1;
CredID=;RMCKey=;RMCKeyLength=8;

HmcStat=1 indicates that an HMC is currently active (if the HmcStat value is some other value, the partition is not currently using–or in touch with–the HMC in that slot). Note that HSC is an older term for HMC: “Hardware Service Console” instead of “Hardware Management Console”. Any time you see HSC, think HMC.

The serv_config utility is part of the powerpc-utils-papr open source package, which is sometimes renamed by distributors in their products:

  • RedHat combines the powerpc-utils package with powerpc-utils-papr into a package called ppc64-utils.
  • SuSE also combines powerpc-utils with powerpc-utils-papr, and calls it powerpc-utils.
  • Gentoo provides the packages as ibm-powerpc-utils and ibm-powerpc-utils-papr.

Viewing/Configuring Other Serviceability Parameters

There are a number of other interesting variables that can be viewed and/or modified with serv_config:

  • serv_config -e partition_auto_restart: determines if the partition will automatically restart after an abnormal termination (like a kernel panic). Read/write; possible values are 0 or 1.
  • serv_config -e platform_auto_power_restart: determines if the platform restarts automatically after a loss of power. Read/write; possible values are 0 or 1.
  • serv_config -e platform-processor-diagnostics-run-mode: configures the schedule the platform uses for running automatic processor diagnostics. Read/write; possible values are 0 (disable processor diagnostics), 1 (stagger diagnostics so that only one processor is diagnosed at a time), 2 (immediately diagnose all processors), or 3 (diagnose processors on the schedule determined by the hypervisor).
  • serv_config -e sp-current-flash-image: which of the two firmware banks is currently booted? Read only; 0 indicates that the permanent side is booted, and 1 indicates that the temporary side is booted.
  • serv_config -e platform-dump-max-size: specifies the maximum filesystem space that can be used by platform (hypervisor, service processor, etc.) dump data. Read only.

An important note: not all systems and configurations will provide all of the possible parameters. To set one of the read/write parameters, use the following syntax: serv_config -e <parameter>=<value>.

There are a number of other parameters that control service processor features such as surveillance, call home, wake-on-LAN, remote power-on, etc. These can be a little more complicated to configure, but serv_config provides a simple command-line interface for configuring them. Run man serv_config for more details.