IBM Support

PM64875: ME FAILOVER MAY NOT SUCCEED IF CONNECTION TO DB2 IS DETERMINED BAD.

Fixes are available

7.0.0.27: WebSphere Application Server V7.0 Fix Pack 27
8.5.0.2: WebSphere Application Server V8.5 Fix Pack 2
8.0.0.6: WebSphere Application Server V8.0 Fix Pack 6
7.0.0.29: WebSphere Application Server V7.0 Fix Pack 29
8.0.0.7: WebSphere Application Server V8.0 Fix Pack 7
8.0.0.8: WebSphere Application Server V8.0 Fix Pack 8
7.0.0.31: WebSphere Application Server V7.0 Fix Pack 31
7.0.0.27: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.33: WebSphere Application Server V7.0 Fix Pack 33
8.0.0.9: WebSphere Application Server V8.0 Fix Pack 9
7.0.0.35: WebSphere Application Server V7.0 Fix Pack 35
8.0.0.10: WebSphere Application Server V8.0 Fix Pack 10
7.0.0.37: WebSphere Application Server V7.0 Fix Pack 37
8.0.0.11: WebSphere Application Server V8.0 Fix Pack 11
7.0.0.39: WebSphere Application Server V7.0 Fix Pack 39
8.0.0.12: WebSphere Application Server V8.0 Fix Pack 12
7.0.0.41: WebSphere Application Server V7.0 Fix Pack 41
8.0.0.13: WebSphere Application Server V8.0 Fix Pack 13
7.0.0.43: WebSphere Application Server V7.0 Fix Pack 43
8.0.0.14: WebSphere Application Server V8.0 Fix Pack 14
7.0.0.45: WebSphere Application Server V7.0 Fix Pack 45
8.0.0.15: WebSphere Application Server V8.0 Fix Pack 15
7.0.0.27: Java SDK 1.6 SR12 Cumulative Fix for WebSphere Application Server
7.0.0.29: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.45: Java SDK 1.6 SR16 FP60 Cumulative Fix for WebSphere Application Server
7.0.0.31: Java SDK 1.6 SR15 Cumulative Fix for WebSphere Application Server
7.0.0.35: Java SDK 1.6 SR16 FP1 Cumulative Fix for WebSphere Application Server
7.0.0.37: Java SDK 1.6 SR16 FP3 Cumulative Fix for WebSphere Application Server
7.0.0.39: Java SDK 1.6 SR16 FP7 Cumulative Fix for WebSphere Application Server
7.0.0.41: Java SDK 1.6 SR16 FP20 Cumulative Fix for WebSphere Application Server
7.0.0.43: Java SDK 1.6 SR16 FP41 Cumulative Fix for WebSphere Application Server

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The WebSphere Applicaiton Server that was running Messaging
    Engine (ME) was being brought down.  That caused ME to failover
    to another cluster member on a different LPAR which is expected.
    
    However, the adjunct in the 2nd lpar got the errors below and
    adjunct was terminated.  To recover, the application server had
    to be manually restarted.
    
    J2CA0206W: A connection error occurred.  To help determine the
    problem, enable the Diagnose Connection Usage option on the
    Connection Factory or Data Source.
    
    J2CA0056I: The Connection Manager received a fatal connection
    error from the Resource Adapter for resource
    jdbc/<<<resourceName>>>. The exception is:
    com.ibm.db2.jcc.am.ClientRerouteException:
    [jcc][t4][2027][11212][3.59.83] A connection failed but has been
    re-established. The host name or IP address is
    "abc.ibm.comt" and the service name or port number is 1,234.
    Special registers may or may not be re-attempted (Reason code =
    1). ERRORCODE=-30108, SQLSTATE=08506
    
    Followed by FFDC error:
    
    [jcc][t4][2027][11212][3.59.83] A connection failed but has been
    re-established. The host name or IP address is
    "abc.ibm.com" and the service name or port number is 1,234.
    Special registers may or may not be re-attempted (Reason code =
    1). ERRORCODE=-30108, SQLSTATE=08506
    at com.ibm.db2.jcc.am.dd.a(dd.java:304)
    at com.ibm.db2.jcc.am.dd.a(dd.java:356)
    at com.ibm.db2.jcc.t4.a.a(a.java:473)
    at com.ibm.db2.jcc.t4.a.L(a.java:1024)
    at com.ibm.db2.jcc.t4.b.a(b.java:4885)
    at com.ibm.db2.jcc.t4.l.bc(l.java:124)
    at com.ibm.db2.jcc.am.cn.executeQuery(cn.java:652)
    at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecute
    Query
    at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeQue
    ry
    at com.ibm.ws.sib.msgstore.persistence.impl.MEInnerOwnerTable.
    readOwningME
    at com.ibm.ws.sib.msgstore.persistence.lock.DBLockingThread.
    waitAndRefreshLock
    at com.ibm.ws.sib.msgstore.persistence.lock.DBLockingThread.run
    
    The error above is a result of this query:
    SELECT ME_UUID,INC_UUID,VERSION,MIGRATION_VERSION FROM
    SIBSYS01.SIBOWNER
    1003 1007 0 0 2
    
    Finaly, HA Manager killed the JVM bringing Adjunct down:
    
    HMGR0130I: The local member of group <<< group name>>>
    has indicated that is it not alive. The JVM will be terminated.
    at java.lang.Thread.dumpStack(Thread.java:417)
    at com.ibm.ws.hamanager.proxy.DispatchHAGroupCallbackImpl.isAli
    ve(DispatchHAGroupCallbackImpl.java:193)
    ...
    Panic:component requested panic from isAlive
    
    
    In this case, the problem was that the 2nd WebSphere
    Application Server created a connection to DB2 datasharing
    member that was also being brought down.  That caused DB2 to
    return above ClientRerouteException saying that connection was
    lost, but it was successfully reconnected to a different
    datasharing member.  However, with this property defined:
    sib.msgstore.jdbcFailoverOnDBConnectionLoss=true
    once there is one failure for connecting to DB2, we will not
    retry again and the ME will be brought down.
    
    This apar will provide a property that will make it configurable
    to retry the connection (and how many times) before the ME is
    brought down.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  Users of the default messaging provider for *
    *                  IBM WebSphere Application Server versions   *
    *                  7.0, 8.0, and 8.5                           *
    ****************************************************************
    * PROBLEM DESCRIPTION: In a z O/S LPARs if the Messaging       *
    *                      Engine is configured to be running in   *
    *                      high availability mode and the DB2      *
    *                      which is used as a datastore is also    *
    *                      configured to be a clustered setup,     *
    *                      when one LPAR is brought down the       *
    *                      Messaging Engine on the LPAR would      *
    *                      failover onto the other LPAR. But if    *
    *                      the connection pool returns a           *
    *                      connections that is pointing to a DB2   *
    *                      instance running on  the LPAR which     *
    *                      was brought down then the Messaging     *
    *                      Engine would initiate a local error.    *
    *                      If there are only 2 LPARS then the      *
    *                      system would be rendered without        *
    *                      any Messaging Engine.                   *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    In a setup where WebSphere Application server is running on a
    z/OS LPAR (active passive)topology and is configured to be in
    a high availability mode. If there is a Bus for which Messaging
    Engine is configured to run in a high availability mode on the
    LPARs with the Database(DB2) also configured to run in a
    similar high availability mode on the LPAR. If one the active
    LPARs is brought down the Messaging Engine on the LPAR would
    failover onto the other LPAR. The first time the Messaging
    Engine is coming up it would attempt to obtain a connection
    from the the connection pool. The connection pool would return
    the connection that would point to the DB2 instance running on
    the previous LPAR when attempting to use it, DB2 driver would
    issue a "ClientRerouteException". And by default
    "ClientRerouteException" is mapped to a
    StaleConnectionException. In  thecase of a
    StaleConnectionException the Messaging Engine would not
    re-attempt the previous operation since the connection is not
    guaranteed and would initiate the failover. Since the active
    LPAR was already brought down the system is left without any
    messaging Engine.
    

Problem conclusion

  • After collaborating with the DB2 team we understand that some
    of the error codes in the  "ClientRerouteException" would mean
    that there is already an instance of DB2 up and running
    elsewhere and retrying would connect to a running database with
    guarantee. So in the Messaging Engine we will look for the
    error codes "-30108,-4499,-4498 " in which case we will
    attempt to retry instead of causing a failover.
    
    The fix for this APAR is currently targeted for inclusion in
    fix packs 7.0.0.27, 8.0.0.6, and 8.5.0.2.  Please refer to the
    Recommended Updates page for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM64875

  • Reported component name

    WAS SIB & SIBWS

  • Reported component ID

    620800101

  • Reported release

    300

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-05-17

  • Closed date

    2012-10-04

  • Last modified date

    2012-10-04

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    PM93758

Fix information

  • Fixed component name

    WAS SIB & SIBWS

  • Fixed component ID

    620800101

Applicable component levels

  • R300 PSY

       UP

  • R800 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
28 October 2021