IBM Support

PM70434: EXCESSIVE LOADING OF JOB STATUS OBJECTS BY SCHEDULER LEADS TO OUTOFMEMORYERROR AND OTHER ISSUES UPON ENDPOINT SERVER RESTART.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Excessive loading of job status objects by scheduler leads to
    OutOfMemoryError and other issues upon endpoint server restart.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of WebSphere Extended Deployment  *
    *                  Compute Grid.                               *
    ****************************************************************
    * PROBLEM DESCRIPTION: Excessive heap memory                   *
    *                      consumed in the scheduler especially    *
    *                      when a CG endpoint server is started    *
    *                      or restarted, possibly leading to an    *
    *                      OutOfMemoryError.                       *
    *                      Also there is a timing issue flowing    *
    *                      job logs back to wsgrid clients that    *
    *                      can lead to a problem dispatching a     *
    *                      job as well as reporting                *
    *                      its correct status.                     *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    When an endpoint server is started or restarted, the scheduler
    server queries its internal database tables for information
    about job executions associated with the endpoint being
    started.  This query was unnecessarily loading too
    much into memory, and in cases where there were a large
    number of completed jobs in the tables, this could lead
    to excessive resource usage, processing slowdown, and
    even OutOfMemoryError.
    For the second problem, a job coming in from the WSGrid
    interface could result in dispatch not working properly
    if a timing window was hit.   The problem also involved
    the mechanism by which the scheduler requests job log
    parts from the dispatch endpoint to stream back to the
    WSGrid client.  This could lead to the job
    not getting dispatch as well as the job status not being
    maintained correctly, possibly leading to the job appearing
    to be "stuck" in submitted state rather than moving to the
    restartable state upon failure.  This would tend to happen
    if say there were a slowdown (e.g. a GC cycle) on the endpoint
    server to which the job was dispatched right after the dispatch
    had been initiated by the scheduler.
    

Problem conclusion

  • The database queries upon endpoint start were refined to
    ignore completed jobs.   For the second problem the
    WSGrid processing was modified to short-circuit the job
    log streaming from the endpoint as long as the job is still
    in "submitted" state.
    The fix for this APAR is currently targeted for inclusion in
    fixpack 8.0.0.3 Please refer to the Recommended Updates page
    for delivery information:
    http://www.ibm.com/support/docview.wss?uid=swg27022998
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM70434

  • Reported component name

    WXD COMPUTE GRI

  • Reported component ID

    5725C9301

  • Reported release

    800

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2012-08-07

  • Closed date

    2012-12-17

  • Last modified date

    2012-12-17

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    PM79172

Fix information

  • Fixed component name

    WXD COMPUTE GRI

  • Fixed component ID

    5725C9301

Applicable component levels

  • R800 PSY

       UP

[{"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SSFVRM","label":"WebSphere Extended Deployment Compute Grid"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0"}]

Document Information

Modified date:
28 April 2022