Fixes are available
APAR status
Closed as program error.
Error description
Sometimes jobs in customer environment remain in the submitted state, despite indications from joblogs or other log files that they should have reached some other state (ended, become restartable, etc.). They have to interrupt their service by cycling endpoints, the job scheduler, or sometimes even the cell to clear the issue.
Local fix
For the time being customer can recycle the endpoints to clear the issue.
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere Extended Deployment * * Compute Grid Version 8 * **************************************************************** * PROBLEM DESCRIPTION: Problem initializing job log * * processing has side effect leading to * * jobs stuck in submitted state. * * A second problem is that jobs that * * failed before their first step ran * * were, on restart, going * * right to "ended" state without * * executing any of the jobs' steps. * **************************************************************** * RECOMMENDATION: * **************************************************************** A problem initializing job log handling for a given job was handled incorrectly by the runtime on the endpoint server. This led to a problem downstream where the job status wasn't getting communicated and managed correctly. So rather than the job appearing to have failed and been put into "restartable" state, the job appears to be stuck in "submitted" state (still in "submitted" state since job log initialization happens early on in the job lifecycle). A second problem found at the same time occurs when a job fails and is put into "restartable" state before the first step ever gets executed. The problem appears when the job is restarted. The restarted job appears to execute successfully and finish in the 'ended' state, according to the external job status as seen for example in the job management console. However, none of the steps comprising the job will actually have executed, due to a flaw in the logic used by the batch container in restart execution.
Problem conclusion
Job log initialization error handling has been tightened up so that a failure puts the job into restarted state. The bug in restart execution has been fixed so that upon restart, a job that failed before the first step was attempted to be executed will resume execution at the first step. The fix for this APAR is currently targeted for inclusion in fixpack 8.0.0.3. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?uid=swg27022998
Temporary fix
An interim fix is available upon request.
Comments
APAR Information
APAR number
PM71339
Reported component name
WXD COMPUTE GRI
Reported component ID
5725C9301
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2012-08-22
Closed date
2013-01-02
Last modified date
2013-01-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WXD COMPUTE GRI
Fixed component ID
5725C9301
Applicable component levels
R800 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSFVRM","label":"WebSphere Extended Deployment Compute Grid"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
29 October 2021