IBM Support

PH13273: TERMINATION HUNG DUE TO DEADLOCKED THREADS IN CR

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Customer could not stop the server by 'STOP' command.They had to
    cancel it.
    Dump which was taken during termination hung showed there were
    deadlocked threads in CR.
    -com/ibm/ws390/xmem/proxy/channel/XMemProxyCRInboundConnLink.sen
    dFirstChunkToSr()  source: XMemProxyCRInboundConnLink.java:1163
    -com/ibm/ws/tcp/channel/impl/ZAioTCPConnLink.destroyCommon(Excep
    tion) source: ZAioTCPConnLink.java:1072
    .
    The cause of the deadlock is a small timing window.  On some
    paths through the inbound request path we get a "readlock" which
    will be held across the queuing of the request to WLM for
    dispatchwithin a Servant.
    .
    Just after queuing to WLM we enter a synchronized block on the
    XMemProxyCRInboundConnLink object to determine if there's more
    data to read for the new request.  Before this thread can enter
    this block the queued request was dispatched in the Servant and
    its final response is on another Controller thread to drive back
    to the client.
    .
    This second CR thread is in
    XMemProxyCRInboundConnLink.sendFinalResponse where it obtains
    the lock on the XMemProxyCRInboundConnLink and sends the
    response. It then determines the disposition of the current
    connection. It needs to decide if it is a healthy persistent
    connection then it would just cleanup the current request and
    issue another read for the next request.If it is not a healthy
    connection it will attempt to cleanup the connection.
    In the deadlock scenario it attempted to cleanup the connection.
    It made it into the ZAioTCPConnLink.destroyCommon method under
    the close()processing where it attempts to get the "readlock"
    "readlock" then "writelock" to ensure that no outstanding I/O
    exists for this connection.
    This is where it became blocked.   The inbound thread is holding
    the "readlock" and when it received control after queuing the
    inbound request to WLM it attempted to the get the lock on the
    XMemProxyCRInboundConnLink object but the sendFinalResponse
    thread obtained it first.  Now both threads are deadlocked.
    .
    This only applies to V9.   This problem is a result of changes
    unique to V9.
    

Local fix

  • Cancel the server.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server                                      *
    *                  V9.0                                        *
    ****************************************************************
    * PROBLEM DESCRIPTION: WebSphere Application Server for z/OS   *
    *                      may hang during stop processing.        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    The server may not stop after receiving a stop command.  The
    reason for the hang is that a HTTP Request could still be in
    controller native request registry.  It has been observed,
    from the ORB_Request stateflag information of this HTTP
    Request,
    that the HTTP Request has completed dispatch in a Servant and
    it has started controller response processing, but not
    completed.
    There are 2 controller ACRW threads deadlocked for this HTTP
    Request. The cause of the deadlock is a small timing
    window.
    On some paths through the inbound request path we get a
    "deadlock" which will be held across the queuing of the
    request to WLM for dispatch within a Servant. In
    XMemProxyCRInboundConnLink.sendFirstChunkToSr, just after
    queuing to WLM, we enter a synchronized block on the
    XMemProxyCRInboundConnLink object to determine if there's
    more data to read for the new request.  Before this thread can
    enter this block the queued request was dispatched in the
    Servant and its final response is on another Controller thread
    to drive back to the client.
    This second CR thread is in
    XMemProxyCRInboundConnLink.sendFinalResponse where it
    obtains the lock on the XMemProxyCRInboundConnLink and sends
    the response.  It then determines the disposition of the
    current connection.  It needs to decide if it is a healthy
    persistent connection then it would just cleanup the current
    request and issue another read for the next request.  If it is
    not a healthy connection it will attempt to cleanup the
    connection.
    In the deadlock scenario it attempted to cleanup the
    connection.  It made it into the ZAioTCPConnLink.destroyCommon
    method under the close() processing where it attempts to get
    the "readlock" then "writelock" to ensure that no outstanding
    I/O exists for this connection. This is where it became
    blocked.
    The inbound thread is holding the "readlock" and when it
    received control after queuing the inbound request to WLM it
    attempted to the get the lock on the XMemProxyCRInboundConnLink
    object but the sendFinalResponse thread obtained it first.
    Now both threads are deadlocked.
    

Problem conclusion

  • Code has been modified in the
    XMemProxyCRInboundConnLink.sendFirstChunkToSr method to obtain
    the XMemProxyCRInboundConnLink lock prior to queuing the
    request to WLM.
    
    The fix for this APAR is currently targeted for inclusion in
    fix pack 9.0.5.1.  Please refer to the Recommended Updates
    page for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH13273

  • Reported component name

    WEBSPHERE FOR Z

  • Reported component ID

    5655I3500

  • Reported release

    900

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-06-12

  • Closed date

    2019-06-19

  • Last modified date

    2019-06-19

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBSPHERE FOR Z

  • Fixed component ID

    5655I3500

Applicable component levels

  • R900 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SS7K4U","label":"WebSphere Application Server for z\/OS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"900","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
17 October 2021