Servicelog Updates

The servicelog package has been updated to version 1.0.  This new version uses an sqlite database as a backend (instead of the Berkeley DB backend that the 0.x stream used).  The primary advantage to the sqlite relational database backend is that queries of the servicelog can be performed with standard SQL queries.  The –query flag to the servicelog command now takes an SQL WHERE clause as an argument.  For example, to view all open serviceable events, run:

/usr/bin/servicelog --query "serviceable=1 and closed=0"

To view all migrations that a logical partition has undergone:

/usr/bin/servicelog --query 'refcode="#MIGRATE"'

The ability to register notification tools with servicelog, available in the 0.x stream, is still supported, with even more flexibility: now you can specify a query string for matching when registering a new notification tool.  When a new event is logged, the tool will only be invoked if the event matches the criteria specified in that query string.  For example, run the following command (as root) to cause a tool called /opt/foo/some_command to be automatically invoked just after a partition is migrated to a different system:

/usr/bin/servicelog_notify --add --command='/opt/foo/some_command' --match='refcode="#MIGRATE"'

Power Platform Diagnostics: Source Available

The package for performing Power platform diagnostics, ppc64-diag, has just been open sourced under the Eclipse Public License.  Much of what I discussed in my previous post about predictive self healing is implemented in this package (and in servicelog, which is already open source).

Here are some of the advantages provided by the ppc64-diag package:

  • retrieval of first-failure error data from platform-level components, such as memory, CPUs, caches, fans, power supplies, VPD cards, voltage regulator modules, I/O subsystems, service processors, risers, etc.
  • the ability to offline CPUs or logical memory blocks (LMBs) that are predicted to fail
  • notifications of EPOW events (environmental and power warnings), and initiation of shutdowns due to abnormal thermal and voltage conditions if no redundant fans or power supplies are available
  • monitoring of platform-level elements (fans, power supplies, riser cards, etc.) in external I/O enclosures
  • retrieval of dumps from platform components to assist in problem determination (for example, dump data from a failed service processor)

The ppc64-diag package is generally install-and-forget; any platform events that may occur are logged to servicelog, with indications of the event severity and whether the event is serviceable (i.e. requires service action) or not.  Additional relevant information is also logged to servicelog, such as a reference code, and the location code and part number of a failing device (obtained from lsvpd).  Tools may be registered with servicelog to be automatically notified when new events are logged.