A fix is available
APAR status
Closed as program error.
Error description
Customer was running Compute Grid 8.0 in their production environment. The jobs in this environment are triggered via WSGrid, and they observed that several jobs had active WSGrid sessions that were not reflected as jobs in the JMC. There was only one job was in the executing state in the JMC, but its joblog indicated that it should have been in a different state. They cycled the endpoint appserver that this job had run on, at which point the normal flow of jobs through the environment via WSGrid resumed.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: Users of the batch function of IBM * * WebSphere Application Server V8.5 * **************************************************************** * PROBLEM DESCRIPTION: Job log streaming for jobs submitted * * via WSGrid (for example, via an * * external scheduler) appears to stop * * due to a hang or a slowdown on an * * endpoint server running * * already-submitted (via WSGrid) jobs. * **************************************************************** * RECOMMENDATION: * **************************************************************** The problem can happen when an endpoint server running jobs dispatched via the WSGrid interface experiences a slowdown, for example, because it is thrashing en route to running out of memory. If an endpoint slows down enough, it might not respond to the scheduler's requests to receive job log updates and status updates and send them back to the WSGrid client (the external scheduler). The scheduler threads may hang for a long time, waiting for the endpoint's response. Since there is only a single thread pool in the scheduler used to stream the output from all endpoints, this can lead to the situation where there is no output being received over the WSGrid interface at all, (since all the relevant scheduler threads are hung waiting for output from a single bad server). However, the jobs submitted to the other (good) endpoints should still have run normally in this scenario, although the output is not handled properly and sent back to the WSGrid client.
Problem conclusion
The scheduler threads streaming output from the endpoint server for WSGrid-submitted jobs back to the WSGrid client will now timeout rather than hanging indefinitely. So a single bad endpoint can slow down output streaming, but only in proportion to the number of jobs on these endpoints compared to the total jobs managed by this scheduler, rather than preventing streaming of all WSGrid output. The fix for this APAR is currently targeted for inclusion in fix pack 8.5.0.2. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix
Comments
APAR Information
APAR number
PM75190
Reported component name
WEBS APP SERV N
Reported component ID
5724H8800
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-10-17
Closed date
2013-01-02
Last modified date
2013-03-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WEBS APP SERV N
Fixed component ID
5724H8800
Applicable component levels
R850 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.5","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
01 November 2021