IBM Support

PM57226: ABEND0C4 IN DFHDSTCB AT +X'2668' DUE TO A TASK REMAINING ON AN L8 TCB AFTER ISSUING A CHANGE_MODE TO THE QR

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A task has been the subject of a deferred FORCEPURGE while it
    was in a running state. The running task then actioned this
    purge as it suspended on a lock manager lock. DFHDSTCB routine
    SLEEP_CS does this by calling DFHDSDS4 after updating the state
    of the task to PURGE_PENDING.
    
    In parallel, another task (running on its own L8/L9) was issuing
    a resume as it released the lock which the first task had
    suspended on. We then get into a race condition. The PURGE logic
    and the RESUME logic have worked in a fashion which resulted in
    both PURGE and RESUME placing the same task on the L8/L9's
    dispatchable chain.
    The suspending task relinquishes control to the dispatcher.
    DFHDSTCB SLEEP_CS (running under the default DSTCB housekeeping
    task for the running TCB) discovers there is a purge pending
    which is actionable (this has to be FORCEPURGE as the suspend
    was from lock manager which prohibits normal purges). The
    PURGE_STATUS of the task is updated to PURGE_PENDING using CS
    and DFHDSDS4 ?PURGE is invoked. This calls SUSPEND_TOKENS_PURGE
    (as the task has issued DSSR SUSPEND). This detects that the
    suspend token used on the suspend is in a SUSPENDED state. We
    now execute code to set the state of the suspend token to PURGED
    using CS.
    
    However, just before this CS operation, there must be a DSSR
    RESUME from the task releasing the lock. This executes code in
    DFHDSSR RESUME_TASK_PROC. This also discovers the suspend token
    is in a SUSPENDED state. This code also updates the state of the
    suspend token - from SUSPENDED to RESET. However, this code
    doesn't use CS ! The CS operation in DFHDSDS4 must have worked
    and the update in DFHDSSR must have followed. DFHDSSR failed to
    detect that the suspend token state had changed underneath it's
    feet. DFHDSSR will now call WAKE_TASK. This in turn drives
    DFHDSWKT. As things stand right at this moment, the task's
    PURGE_STATUS is PURGE_PENDING.
    
    DFHDSWKT has code to detect this and stop RESUME processing from
    adding the task to the dispatchable chain (by setting RETC=1).
    However, in parallel with the call to DFHDSWKT by RESUME
    processing, we are continuing to execute code in DFHDSDS4 PURGE
    logic on the other L8/L9. As DFHDSDS4 believes that the state of
    the suspend token is now PURGED, it proceeds to update the
    task's PURGE_STATUS to PURGED. The TASK_STATE is set to
    DISPATCHABLE and the task is placed on the dispatchable chain.
    
    DFHDSWKT now runs (under RESUME logic). The target task is now
    in a DISPATCHABLE/PURGED state. DFHDSWKT has no code fragment to
    handle this state so it drops into the OTHERWISE clause at the
    end of the code. This returns to WAKE_TASK, leaving RETC=0 which
    signals to WAKE_TASK that it should place the task on the
    dispatchable chain. So the task is now on the L8/L9's
    disptachable chain twice.
    .
    Additional symptoms: CICS unresponsive, at maxtask with
    many CWXN transactions.  A dump at the time showed no
    task running on the QR TCB, but the KTCB for the QR TCB
    was running. Its last stack entry showed it was processing
    in DFHDSTCB - DOUBLE_CHAIN_SORT_MERGE.  The SYSTRACE showed
    the QR TCB in a tight loop due to a DTA+30 pointing to itself.
    This was at offsets x'263E' to x'2654' into DFHDSTCB at ptf
    UK57632. The problem involves timing where a task is suspended
    and resumed at the same time putting its DTA on the dispatchable
    when it is already executing.
    Also in internal trace you will also see trace entries for a
    task we think is running on the QR TCB, but it is actually not
    the correct TCB address for the QR. In this case it was an L8
    TCB.
    mxt hang hung stall forcepurge lmqueue
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All CICS users.                              *
    ****************************************************************
    * PROBLEM DESCRIPTION: Abend 0C4 in DFHDSTCB after a deferred  *
    *                      FORCEPURGE request is processed.        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    A task has been the subject of a deferred FORCEPURGE. After the
    task suspends, CICS attempts to process the purge. In routine
    SLEEP_CS, the state of the task is updated to PURGE_PENDING, and
    DFHDSDS4 is called.
    In parallel, another task on an open TCB releases the lock on
    which the first task was waiting, and issues a resume.
    A race condition now exists between the PURGE and the RESUME
    logic. This results in a task being placed on its TCB's
    dispatchable chain twice, and leads to the eventual CICS abend.
    

Problem conclusion

  • DFHDSSR has been modified to use compare and swap when modifying
    the task's resume state. This prevents concurrent tasks
    affecting each other in this adverse way.
    

Temporary fix

  • FIX AVAILABLE BY PTF ONLY
    

Comments

APAR Information

  • APAR number

    PM57226

  • Reported component name

    CICS TS Z/OS V4

  • Reported component ID

    5655S9700

  • Reported release

    600

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-02-01

  • Closed date

    2012-05-08

  • Last modified date

    2013-09-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK78710 UK78711 PM97098

Modules/Macros

  •    DFHDSSR
    

Fix information

  • Fixed component name

    CICS TS Z/OS V4

  • Fixed component ID

    5655S9700

Applicable component levels

  • R600 PSY UK78710

       UP12/05/19 P F205

  • R700 PSY UK78711

       UP12/05/19 P F205

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGMGV","label":"CICS Transaction Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.1","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
16 September 2013