IBM Support

PM39049: CWSID0029E MESSAGE WHEN WSAS SERVICE INTEGRATION BUS MESSAGING ENGINE FAILSOVER AFTER A NETWORK OUTAGE

Fixes are available

7.0.0.21: WebSphere Application Server V7.0 Fix Pack 21
8.0.0.2: WebSphere Application Server V8.0 Fix Pack 2
8.0.0.3: WebSphere Application Server V8.0 Fix Pack 3
7.0.0.23: WebSphere Application Server V7.0 Fix Pack 23
8.0.0.4: WebSphere Application Server V8.0 Fix Pack 4
7.0.0.25: WebSphere Application Server V7.0 Fix Pack 25
8.0.0.5: WebSphere Application Server V8.0 Fix Pack 5
7.0.0.27: WebSphere Application Server V7.0 Fix Pack 27
8.0.0.6: WebSphere Application Server V8.0 Fix Pack 6
7.0.0.29: WebSphere Application Server V7.0 Fix Pack 29
8.0.0.7: WebSphere Application Server V8.0 Fix Pack 7
6.1.0.47: WebSphere Application Server V6.1 Fix Pack 47
8.0.0.8: WebSphere Application Server V8.0 Fix Pack 8
7.0.0.31: WebSphere Application Server V7.0 Fix Pack 31
7.0.0.27: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.33: WebSphere Application Server V7.0 Fix Pack 33
8.0.0.9: WebSphere Application Server V8.0 Fix Pack 9
7.0.0.35: WebSphere Application Server V7.0 Fix Pack 35
8.0.0.10: WebSphere Application Server V8.0 Fix Pack 10
7.0.0.37: WebSphere Application Server V7.0 Fix Pack 37
8.0.0.11: WebSphere Application Server V8.0 Fix Pack 11
7.0.0.39: WebSphere Application Server V7.0 Fix Pack 39
8.0.0.12: WebSphere Application Server V8.0 Fix Pack 12
7.0.0.41: WebSphere Application Server V7.0 Fix Pack 41
8.0.0.13: WebSphere Application Server V8.0 Fix Pack 13
7.0.0.43: WebSphere Application Server V7.0 Fix Pack 43
8.0.0.14: WebSphere Application Server V8.0 Fix Pack 14
7.0.0.45: WebSphere Application Server V7.0 Fix Pack 45
8.0.0.15: WebSphere Application Server V8.0 Fix Pack 15
6.1.0.41: Java SDK 1.5 SR12 FP5 Cumulative Fix for WebSphere Application Server
6.1.0.43: Java SDK 1.5 SR13 Cumulative Fix for WebSphere Application Server
6.1.0.45: Java SDK 1.5 SR14 Cumulative Fix for WebSphere Application Server
6.1.0.47: Java SDK 1.5 SR16 Cumulative Fix for WebSphere Application Server
7.0.0.21: Java SDK 1.6 SR9 FP2 Cumulative Fix for WebSphere
7.0.0.23: Java SDK 1.6 SR10 FP1 Cumulative Fix for WebSphere
7.0.0.25: Java SDK 1.6 SR11 Cumulative Fix for WebSphere Application Server
7.0.0.27: Java SDK 1.6 SR12 Cumulative Fix for WebSphere Application Server
7.0.0.29: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.45: Java SDK 1.6 SR16 FP60 Cumulative Fix for WebSphere Application Server
7.0.0.31: Java SDK 1.6 SR15 Cumulative Fix for WebSphere Application Server
7.0.0.35: Java SDK 1.6 SR16 FP1 Cumulative Fix for WebSphere Application Server
7.0.0.37: Java SDK 1.6 SR16 FP3 Cumulative Fix for WebSphere Application Server
7.0.0.39: Java SDK 1.6 SR16 FP7 Cumulative Fix for WebSphere Application Server
7.0.0.41: Java SDK 1.6 SR16 FP20 Cumulative Fix for WebSphere Application Server
7.0.0.43: Java SDK 1.6 SR16 FP41 Cumulative Fix for WebSphere Application Server

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • In a Websphere Application Server Service Integration Bus
    Cluster Bus Member, a network outage causes the messaging
    engine to start on another server even though it is still
    running - a so called split brain scenario.
    
    When the network is restored, and the split brain scenario is
    resolved, the messaging engine instructed to start on the
    second server is then told to stop.
    
    After this, the SystemOut log of the server where the original
    messaging engine that was always running has the following
    messages:
    
    CWSIS1535E: The messaging engine's unique id does not match
    that found in the data store. ME_UUID=xxxxxxxxxxxxxxxx,
    INC_UUID=yyyyyyyyyyyyyyyy, ME_UUID(DB)=xxxxxxxxxxxxxxxx,
    INC_UUID(DB)=zzzzzzzzzzzzzzzz
    
    CWSID0029E: Messaging engine XYZ suffered a common mode error.
    CWSID0016I: Messaging engine XYZ is in state Failed!.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  Users of the default messaging provider     *
    *                  for IBM WebSphere Application Server        *
    ****************************************************************
    * PROBLEM DESCRIPTION: CWSIS1535E,CWSID0029E and CWSID0029E    *
    *                      messages in the SystemOut log of a      *
    *                      cluster bus member server following a   *
    *                      network outage.                         *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    A split brain scenario refers to a case where the HAManager's
    connectivity between servers is broken, resulting in two
    isolated parts of a core group believing they are the only
    servers running.  In this scenario, the HAManager tries to
    ensure all services are running.
    For the Service Integration Bus, the HAManager will instruct
    any messaging engines that need to be running to start, even
    though they may already be running in another server.  The
    HAManager does not know they are running in another server
    because of the split brain.
    Normally, the new messaging engine incarnation (INC2)
    instructed to start will not be able to because the original
    incarnation (INC1) is still running and  holds a lock on the
    database preventing any other incarnation starting.
    If, however, the network outage also caused INC1's database
    connection to break, INC1 loses its lock and INC2 is then
    able to start successfully and update the table to indicate
    that INC2 is now the owner.
    After the network is restored, HAManager realizes that there
    are two messaging engine incarnations running and instructs
    INC2 to stop.
    When INC2 releases its lock on the database, INC1, who has
    been attempting to reobtain the lock, will be able to connect
    and finds that the incarnation ID has changed.  At this
    point, it incorrectly declares a common mode error.
    The following would be the sequence of the events that can be
    observed on both the messaging engine SystemOut logs.
    ++++++++++++++
    INC1 SystemOut
    ++++++++++++++
    CWSIS1594I: The messaging engine, ME_UUID=ME_ID,
    INC_UUID=INC1_ID, has lost the lock on the data store.
    CWSIS1538I: The messaging engine, ME_UUID=<ME_ID>,
    INC_UUID=<INC1_ID>, is attempting to obtain an exclusive lock
    on the data store
    ...
    CWSIS1537I: The messaging engine, ME_UUID=ME_ID,
    INC_UUID=<INC1_ID>, has acquired an exclusive lock on the data
    store.
    CWSIS1535E: The messaging engine's unique id does not match
    that found in the data store. ME_UUID=<ME_ID>,
    INC_UUID=<INC1_ID> ME_UUID(DB)=<ME_ID>,
    INC_UUID(DB)=<INC2_ID>
    CWSIS1546I: The messaging engine, ME_UUID=<ME_ID_1>,
    INC_UUID=<INC1_ID>, has lost an existing lock or failed to
    gain an initial lock on the data store.
    CWSID0029E: Messaging engine <BUS> suffered a common mode
    error.
    CWSID0016I: Messaging engine <BUS> is in state Failed!.
    +++++++++++++
    ME2 SystemOut
    +++++++++++++
    CWSID0016I: Messaging engine <BUS> is in state Starting.  --->
    this would happen around the time there is a network outage.
    CWSIS1537I: The messaging engine, ME_UUID=<ME_ID>,
    INC_UUID=<INC2_ID>, has acquired an exclusive lock on the data
    store.
    CWSID0016I: Messaging engine <BUS> is in state StoppingMember.
    ---> this would happen soon after the network is restored.
    

Problem conclusion

  • The code has been modified to ensure that the messaging engine
    declares a local error when it finds the incarnation ID has
    changed.  This causes the server where the messaging engine is
    running to be restarted.  This allows HAManager to start the
    messaging engine on any other available server.
    
    The fix for this APAR is currently targeted for inclusion in
    fix pack 6.1.0.41, 7.0.0.21 and 8.0.0.3.  Please refer to the
    Recommended Updates page for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM39049

  • Reported component name

    PLAT MSG COM

  • Reported component ID

    620600101

  • Reported release

    200

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2011-05-12

  • Closed date

    2011-07-26

  • Last modified date

    2011-07-26

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    PLAT MSG COM

  • Fixed component ID

    620600101

Applicable component levels

  • R200 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.1","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
27 October 2021