IBM Support

PM75190: HUNG ENDPOINT SERVER CAUSES WSGRID FUNCTION IN COMPUTE GRID JOB SCHEDULER SERVER TO STOP SENDING OUTPUT.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Customer was running Compute Grid 8.0 in their production
    environment. The jobs in this environment are triggered via
    WSGrid, and they observed that several jobs had active WSGrid
    sessions that  were not reflected as jobs in the JMC. There was
    only one job was in the executing state in the JMC, but its
    joblog indicated that it should have been in a different state.
    They cycled the endpoint appserver that this job had run on, at
    which point the normal flow of jobs through the environment
    via WSGrid resumed.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  Users of the batch function of IBM          *
    *                  WebSphere Application Server V8.5           *
    ****************************************************************
    * PROBLEM DESCRIPTION: Job log streaming for jobs submitted    *
    *                      via WSGrid (for example,  via an        *
    *                      external scheduler) appears to stop     *
    *                      due to a hang or a slowdown on an       *
    *                      endpoint server running                 *
    *                      already-submitted (via WSGrid) jobs.    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    The problem can happen when an endpoint server running jobs
    dispatched via the WSGrid interface experiences a slowdown,
    for example, because it is thrashing en route to running out
    of memory.
    If an endpoint slows down enough, it might not respond to the
    scheduler's requests to receive job log updates and status
    updates and send them back to the WSGrid client (the external
    scheduler).
    The scheduler threads may hang for a long time, waiting for
    the endpoint's response.  Since there is only a single thread
    pool in the scheduler used to stream the output from all
    endpoints, this can lead to the situation where there is no
    output being received over the WSGrid interface at all, (since
    all the relevant scheduler threads are hung waiting for output
    from a single bad server).
    However, the jobs submitted to the other (good) endpoints
    should still have run normally in this scenario, although the
    output is not handled properly and sent back to the WSGrid
    client.
    

Problem conclusion

  • The scheduler threads streaming output from the endpoint
    server for WSGrid-submitted jobs back to the WSGrid client
    will now timeout rather than hanging indefinitely. So a single
    bad endpoint can slow down output streaming, but only in
    proportion to the number of jobs on these endpoints compared
    to the total jobs managed by this scheduler, rather than
    preventing streaming of all WSGrid output.
    
    The fix for this APAR is currently targeted for inclusion in
    fix pack 8.5.0.2.  Please refer to the Recommended
    Updates page for delivery information:
    http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM75190

  • Reported component name

    WEBS APP SERV N

  • Reported component ID

    5724H8800

  • Reported release

    850

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-10-17

  • Closed date

    2013-01-02

  • Last modified date

    2013-03-01

  • APAR is sysrouted FROM one or more of the following:

    PM74855

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBS APP SERV N

  • Fixed component ID

    5724H8800

Applicable component levels

  • R850 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
01 November 2021