A fix is available
APAR status
Closed as program error.
Error description
CICS-DB2 connection status shows DISCONNECTING after DB2 abends. DB2CONN is unable to reconnect to DB2. The CDBF task is suspended on resource DB2CDISC which indicates that a SET DB2CONN NOTCONNECTED was issued. DFHD2TM is going to wait for the count of tasks using DB2 to reach zero. The DB2 summary shows that no tasks are active in DB2. All CSUBs have been freed and the CEX2 task has gone away. The CDBF task is suspended in DFHD2TM, waiting for the D2S_DISCONNECT_ECB to be posted by DFHD2STP. . The CICS-DB2 connection never finishes disconnecting because of a steady stream of tasks continually trying to access DB2 and failing with abendAEY9. 1- DB2 abending causes the CEX2 task to do a START for transaction CDBF. 2- This CDBF task does the EXEC CICS SET DB2CONN DISCONNECTED. This causes control to go to DFHD2TM where, in the stop_the_db2_connection iproc, the presence of LOTs on the GWA_LOT chain (meaning there are tasks currently using the DB2 connection) causes process_db2conn_busy_option to be set and control to return. Eventually, the process_db2conn_busy_option causes control to go to iproc db2conn_busy where all the tasks using the DB2 connection are forcepurged. The CDBF task then waits in a DB2CDISC wait, waiting for the D2S_DISCONNECT_ECB to be posted. The CDBF task never wakes up from this wait. 3- The forcepurging of the tasks using DB2 is supposed to cause them to abend and stop using DB2. The last one to stop using DB2 is supposed to then initiate another call to stop the DB2 connection. This happens in the process_task_manager_call iproc in DFHD2EX1. The last task in there finds GWA_LOT = nulls and so then calls DFHD2TM for set_db2conn notconnected. At this point, under this last task, control would go to DFHD2TM and DFHD2CC and eventually to DFHD2STP. 4- DFHD2STP works with CEX2 to terminate the thread subtasks. Then, since STANDBYMODE(RECONNECT) is not set on the DB2CONN, DFHS2STP tries to DISABLE the TRUE. If that call to DISABLE_TRUE is successful, then DFHD2STP would post the D2S_DISCONNECT_ECB ecb that the CDBF task is waiting on. But that call fails, DFHD2STP does not post the bit, and so the CDBF task never wakes up. 5- The DISABLE_TRUE iproc first does an EXEC CICS DISABLE STOP. This causes all tasks that attempt to use the DB2 connection to abend with AEY9. But then the EXEC CICS DISABLE EXITALL keeps failing with INVEXITREQ and RESP2 800080000000 meaning there are still active tasks. There is a loop in here that tries this DISABLE EXITALL every second for 10 minutes. When the 10 minutes is up, it gives up, issues DFHAP0002 with code (X'31B4'), and fails the request. Because of this failure, d2s_disconnect_ecb will never get posted and the CDBF task will hang forever and the DB2 connection will remain disconnecting. 6- DFHUEM processes the EXEC CICS DISABLE commands. It returns the INVEXITREQ with reason 800080000000 when the EPBICNT (invocation count) is non-zero. At time of dump, this EPBICNT is 00000003. And currently at time of dump, there are 3 tasks that are in DFHERM for a DB2 call. These tasks are in the process of abending with abend AEY9. But when DFHERM figures out that a request is for DB2, it bumps the EPBICNT. And, in the case where the request is going to fail with AEY9, EPBICNT isn't decremented until DFHERM's recovery routine ERMREC gets control from Kernel because of the AEY9 that DFHERM initiated. And in between where EPICNT is incremented and then decremented, a transaction dump is taken which lengthens the amount of time with EPBICNT incremented. The trace shows a steady stream of abendAEY9 abends all the way to dump time. So during the entire 10 minutes where DFHD2STP was trying the DISABLE EXITALL, there was always a task in DFHERM for a DB2 request. . One thing that elongates the time in DFHERM is the fact that Fault Analyzer is being used. It hooks in at the XDUREQ exit and participates in lengthening the time. The three tasks that are in DFHERM abending with AEY9 are all currently suspended by Fault Analyzer.
Local fix
The problem can be circumvented by specifying STANDYMODE(RECONNECT) on the DB2CONN. That would cause CICS not to try to DISABLE the TRUE.
Problem summary
**************************************************************** * USERS AFFECTED: All. * **************************************************************** * PROBLEM DESCRIPTION: DFHD2STP suffers severe error (X'31B4') * * and CDBF is in DB2DISC wait following a * * DB2 crash. * **************************************************************** * RECOMMENDATION: * **************************************************************** Following a DB2 crash it is possible for DFHD2STP to suffer a severe error code (X'31B4'). This will occur if the STANDBYMODE in the DB2CONN is set to CONNECT. The CICS task, CDBF, will also be left in a DB2DISC wait. The reason for this problem is that DFHD2STP is attempting to DISABLE the TRUE but because of a steady stream of new tasks trying to use the TRUE the DISABLE command always has a response of TASKACTIVE. The DISABLE command is tried for 10 minutes and will then end with disable_failed which raises the severe error. The new tasks trying to use the TRUE are entering DFHERM and obtaining a TIE and EPB, the EPBICNT count is being incremented and then, because the TRUE is not available, being abended AEY9. The dump domain exit XDUREQ is active and doing extra processing for each AEY9 abend. This results in the task taking longer to return to DFHERM and its recovery routine to decrement the EPBICNT count. This results in the DISABLE command receiving the TASKACTIVE response even when a CICS task is not actually using the TRUE. The reason the CDBF task is in a DB2DISC wait is because it is waiting on an ECB which never gets posted because the DISABLE fails. Keywords: msgDFHAP0002 DFHAP0002 AP0002 AbendAEY9 AbendsAEY9 INVEXITREQ 800080000000 d2s_disconnect_ecb
Problem conclusion
DFHERM has been changed to decrement the EPBICNT count before the AEY9 abend is processed.
Temporary fix
FIX AVAILABLE BY PTF ONLY
Comments
APAR Information
APAR number
PK19028
Reported component name
CICSTS 3.1 Z/OS
Reported component ID
5655M1500
Reported release
400
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2006-02-03
Closed date
2006-02-20
Last modified date
2006-03-02
APAR is sysrouted FROM one or more of the following:
PK15787
APAR is sysrouted TO one or more of the following:
UK11861
Modules/Macros
DFHERM
Fix information
Fixed component name
CICSTS 3.1 Z/OS
Fixed component ID
5655M1500
Applicable component levels
R400 PSY UK11861
UP06/02/22 P F602
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGMGV","label":"CICS Transaction Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.1","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
02 March 2006