PM75190: HUNG ENDPOINT SERVER CAUSES WSGRID FUNCTION IN COMPUTE GRID JOB SCHEDULER SERVER TO STOP SENDING OUTPUT.

A fix is available

8.5.0.2: WebSphere Application Server V8.5 Fix Pack 2

APAR status

Closed as program error.

Error description

Customer was running Compute Grid 8.0 in their production
environment. The jobs in this environment are triggered via
WSGrid, and they observed that several jobs had active WSGrid
sessions that  were not reflected as jobs in the JMC. There was
only one job was in the executing state in the JMC, but its
joblog indicated that it should have been in a different state.
They cycled the endpoint appserver that this job had run on, at
which point the normal flow of jobs through the environment
via WSGrid resumed.

Local fix

Problem summary

****************************************************************
* USERS AFFECTED:  Users of the batch function of IBM          *
*                  WebSphere Application Server V8.5           *
****************************************************************
* PROBLEM DESCRIPTION: Job log streaming for jobs submitted    *
*                      via WSGrid (for example,  via an        *
*                      external scheduler) appears to stop     *
*                      due to a hang or a slowdown on an       *
*                      endpoint server running                 *
*                      already-submitted (via WSGrid) jobs.    *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
The problem can happen when an endpoint server running jobs
dispatched via the WSGrid interface experiences a slowdown,
for example, because it is thrashing en route to running out
of memory.
If an endpoint slows down enough, it might not respond to the
scheduler's requests to receive job log updates and status
updates and send them back to the WSGrid client (the external
scheduler).
The scheduler threads may hang for a long time, waiting for
the endpoint's response.  Since there is only a single thread
pool in the scheduler used to stream the output from all
endpoints, this can lead to the situation where there is no
output being received over the WSGrid interface at all, (since
all the relevant scheduler threads are hung waiting for output
from a single bad server).
However, the jobs submitted to the other (good) endpoints
should still have run normally in this scenario, although the
output is not handled properly and sent back to the WSGrid
client.

Problem conclusion

The scheduler threads streaming output from the endpoint
server for WSGrid-submitted jobs back to the WSGrid client
will now timeout rather than hanging indefinitely. So a single
bad endpoint can slow down output streaming, but only in
proportion to the number of jobs on these endpoints compared
to the total jobs managed by this scheduler, rather than
preventing streaming of all WSGrid output.

The fix for this APAR is currently targeted for inclusion in
fix pack 8.5.0.2.  Please refer to the Recommended
Updates page for delivery information:
http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980

Temporary fix

Comments

APAR Information

APAR number
PM75190
Reported component name
WEBS APP SERV N
Reported component ID
5724H8800
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-10-17
Closed date
2013-01-02
Last modified date
2013-03-01

APAR is sysrouted FROM one or more of the following:

PM74855
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WEBS APP SERV N
Fixed component ID
5724H8800

Applicable component levels

R850 PSY
UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
01 November 2021

Tips

PM75190: HUNG ENDPOINT SERVER CAUSES WSGRID FUNCTION IN COMPUTE GRID JOB SCHEDULER SERVER TO STOP SENDING OUTPUT.

A fix is available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R850 PSY

Document Information

Share your feedback

Need support?