Fixes are available
7.0.0.21: WebSphere Application Server V7.0 Fix Pack 21
8.0.0.2: WebSphere Application Server V8.0 Fix Pack 2
8.0.0.3: WebSphere Application Server V8.0 Fix Pack 3
7.0.0.23: WebSphere Application Server V7.0 Fix Pack 23
8.0.0.4: WebSphere Application Server V8.0 Fix Pack 4
7.0.0.25: WebSphere Application Server V7.0 Fix Pack 25
8.0.0.5: WebSphere Application Server V8.0 Fix Pack 5
7.0.0.27: WebSphere Application Server V7.0 Fix Pack 27
8.0.0.6: WebSphere Application Server V8.0 Fix Pack 6
7.0.0.29: WebSphere Application Server V7.0 Fix Pack 29
8.0.0.7: WebSphere Application Server V8.0 Fix Pack 7
6.1.0.47: WebSphere Application Server V6.1 Fix Pack 47
8.0.0.8: WebSphere Application Server V8.0 Fix Pack 8
7.0.0.31: WebSphere Application Server V7.0 Fix Pack 31
7.0.0.27: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.33: WebSphere Application Server V7.0 Fix Pack 33
8.0.0.9: WebSphere Application Server V8.0 Fix Pack 9
7.0.0.35: WebSphere Application Server V7.0 Fix Pack 35
8.0.0.10: WebSphere Application Server V8.0 Fix Pack 10
7.0.0.37: WebSphere Application Server V7.0 Fix Pack 37
8.0.0.11: WebSphere Application Server V8.0 Fix Pack 11
7.0.0.39: WebSphere Application Server V7.0 Fix Pack 39
8.0.0.12: WebSphere Application Server V8.0 Fix Pack 12
7.0.0.41: WebSphere Application Server V7.0 Fix Pack 41
8.0.0.13: WebSphere Application Server V8.0 Fix Pack 13
7.0.0.43: WebSphere Application Server V7.0 Fix Pack 43
8.0.0.14: WebSphere Application Server V8.0 Fix Pack 14
7.0.0.45: WebSphere Application Server V7.0 Fix Pack 45
8.0.0.15: WebSphere Application Server V8.0 Fix Pack 15
6.1.0.41: Java SDK 1.5 SR12 FP5 Cumulative Fix for WebSphere Application Server
6.1.0.43: Java SDK 1.5 SR13 Cumulative Fix for WebSphere Application Server
6.1.0.45: Java SDK 1.5 SR14 Cumulative Fix for WebSphere Application Server
6.1.0.47: Java SDK 1.5 SR16 Cumulative Fix for WebSphere Application Server
7.0.0.21: Java SDK 1.6 SR9 FP2 Cumulative Fix for WebSphere
7.0.0.23: Java SDK 1.6 SR10 FP1 Cumulative Fix for WebSphere
7.0.0.25: Java SDK 1.6 SR11 Cumulative Fix for WebSphere Application Server
7.0.0.27: Java SDK 1.6 SR12 Cumulative Fix for WebSphere Application Server
7.0.0.29: Java SDK 1.6 SR13 FP2 Cumulative Fix for WebSphere Application Server
7.0.0.45: Java SDK 1.6 SR16 FP60 Cumulative Fix for WebSphere Application Server
7.0.0.31: Java SDK 1.6 SR15 Cumulative Fix for WebSphere Application Server
7.0.0.35: Java SDK 1.6 SR16 FP1 Cumulative Fix for WebSphere Application Server
7.0.0.37: Java SDK 1.6 SR16 FP3 Cumulative Fix for WebSphere Application Server
7.0.0.39: Java SDK 1.6 SR16 FP7 Cumulative Fix for WebSphere Application Server
7.0.0.41: Java SDK 1.6 SR16 FP20 Cumulative Fix for WebSphere Application Server
7.0.0.43: Java SDK 1.6 SR16 FP41 Cumulative Fix for WebSphere Application Server
APAR status
Closed as program error.
Error description
In a Websphere Application Server Service Integration Bus Cluster Bus Member, a network outage causes the messaging engine to start on another server even though it is still running - a so called split brain scenario. When the network is restored, and the split brain scenario is resolved, the messaging engine instructed to start on the second server is then told to stop. After this, the SystemOut log of the server where the original messaging engine that was always running has the following messages: CWSIS1535E: The messaging engine's unique id does not match that found in the data store. ME_UUID=xxxxxxxxxxxxxxxx, INC_UUID=yyyyyyyyyyyyyyyy, ME_UUID(DB)=xxxxxxxxxxxxxxxx, INC_UUID(DB)=zzzzzzzzzzzzzzzz CWSID0029E: Messaging engine XYZ suffered a common mode error. CWSID0016I: Messaging engine XYZ is in state Failed!.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: Users of the default messaging provider * * for IBM WebSphere Application Server * **************************************************************** * PROBLEM DESCRIPTION: CWSIS1535E,CWSID0029E and CWSID0029E * * messages in the SystemOut log of a * * cluster bus member server following a * * network outage. * **************************************************************** * RECOMMENDATION: * **************************************************************** A split brain scenario refers to a case where the HAManager's connectivity between servers is broken, resulting in two isolated parts of a core group believing they are the only servers running. In this scenario, the HAManager tries to ensure all services are running. For the Service Integration Bus, the HAManager will instruct any messaging engines that need to be running to start, even though they may already be running in another server. The HAManager does not know they are running in another server because of the split brain. Normally, the new messaging engine incarnation (INC2) instructed to start will not be able to because the original incarnation (INC1) is still running and holds a lock on the database preventing any other incarnation starting. If, however, the network outage also caused INC1's database connection to break, INC1 loses its lock and INC2 is then able to start successfully and update the table to indicate that INC2 is now the owner. After the network is restored, HAManager realizes that there are two messaging engine incarnations running and instructs INC2 to stop. When INC2 releases its lock on the database, INC1, who has been attempting to reobtain the lock, will be able to connect and finds that the incarnation ID has changed. At this point, it incorrectly declares a common mode error. The following would be the sequence of the events that can be observed on both the messaging engine SystemOut logs. ++++++++++++++ INC1 SystemOut ++++++++++++++ CWSIS1594I: The messaging engine, ME_UUID=ME_ID, INC_UUID=INC1_ID, has lost the lock on the data store. CWSIS1538I: The messaging engine, ME_UUID=<ME_ID>, INC_UUID=<INC1_ID>, is attempting to obtain an exclusive lock on the data store ... CWSIS1537I: The messaging engine, ME_UUID=ME_ID, INC_UUID=<INC1_ID>, has acquired an exclusive lock on the data store. CWSIS1535E: The messaging engine's unique id does not match that found in the data store. ME_UUID=<ME_ID>, INC_UUID=<INC1_ID> ME_UUID(DB)=<ME_ID>, INC_UUID(DB)=<INC2_ID> CWSIS1546I: The messaging engine, ME_UUID=<ME_ID_1>, INC_UUID=<INC1_ID>, has lost an existing lock or failed to gain an initial lock on the data store. CWSID0029E: Messaging engine <BUS> suffered a common mode error. CWSID0016I: Messaging engine <BUS> is in state Failed!. +++++++++++++ ME2 SystemOut +++++++++++++ CWSID0016I: Messaging engine <BUS> is in state Starting. ---> this would happen around the time there is a network outage. CWSIS1537I: The messaging engine, ME_UUID=<ME_ID>, INC_UUID=<INC2_ID>, has acquired an exclusive lock on the data store. CWSID0016I: Messaging engine <BUS> is in state StoppingMember. ---> this would happen soon after the network is restored.
Problem conclusion
The code has been modified to ensure that the messaging engine declares a local error when it finds the incarnation ID has changed. This causes the server where the messaging engine is running to be restarted. This allows HAManager to start the messaging engine on any other available server. The fix for this APAR is currently targeted for inclusion in fix pack 6.1.0.41, 7.0.0.21 and 8.0.0.3. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix
Comments
APAR Information
APAR number
PM39049
Reported component name
PLAT MSG COM
Reported component ID
620600101
Reported release
200
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2011-05-12
Closed date
2011-07-26
Last modified date
2011-07-26
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
PLAT MSG COM
Fixed component ID
620600101
Applicable component levels
R200 PSY
UP
Document Information
Modified date:
27 October 2021