Fixes are available
8.0.0.3: WebSphere Extended Deployment Compute Grid V8.0 Fix Pack 3
8.5.5.2: WebSphere Application Server V8.5.5 Fix Pack 2
8.0.0.4: WebSphere Extended Deployment Compute Grid V8.0 Fix Pack 4
8.5.5.3: WebSphere Application Server V8.5.5 Fix Pack 3
8.5.5.4: WebSphere Application Server V8.5.5 Fix Pack 4
8.5.5.5: WebSphere Application Server V8.5.5 Fix Pack 5
8.5.5.6: WebSphere Application Server V8.5.5 Fix Pack 6
8.5.5.7: WebSphere Application Server V8.5.5 Fix Pack 7
8.5.5.8: WebSphere Application Server V8.5.5 Fix Pack 8
8.5.5.9: WebSphere Application Server V8.5.5 Fix Pack 9
8.5.5.10: WebSphere Application Server V8.5.5 Fix Pack 10
8.0.0.5: WebSphere Extended Deployment Compute Grid V8.0 Fix Pack 5
8.5.5.11: WebSphere Application Server V8.5.5 Fix Pack 11
8.5.5.12: WebSphere Application Server V8.5.5 Fix Pack 12
8.5.5.13: WebSphere Application Server V8.5.5 Fix Pack 13
8.5.5.14: WebSphere Application Server V8.5.5 Fix Pack 14
8.5.5.15: WebSphere Application Server V8.5.5 Fix Pack 15
8.5.5.14: WebSphere Application Server V8.5.5 Fix Pack 14
8.5.5.17: WebSphere Application Server V8.5.5 Fix Pack 17
8.5.5.20: WebSphere Application Server V8.5.5.20
8.5.5.18: WebSphere Application Server V8.5.5 Fix Pack 18
8.5.5.19: WebSphere Application Server V8.5.5 Fix Pack 19
8.5.5.16: WebSphere Application Server V8.5.5 Fix Pack 16
8.5.5.21: WebSphere Application Server V8.5.5.21
APAR status
Closed as program error.
Error description
The Scheduler performs a purge on the endpoint as part of the purgeJob processing when reclaiming job numbers. On certain conditions, the job is getting purged from the job status table, but not from the global jod id assignment table. When the scheduler attempts to reclaim the job number, it cannot find any removable jobs in the job status store, which results in the NullPointerException. In the scheduler joblogs, the following exception will be logged WebServicesFault faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.generalExcepti on faultString: WSWS3713E: Connection to the remote host ibmtest failed.Received the following error: java.net.ConnectException: Connection refused faultActor: null faultDetail: WSWS3713E: Connection to the remote host ibmtest failed .Received the following error: java.net.ConnectException: Connection refused at com.ibm.ws.webservices.engine.transport.http.HttpOutboundChannel Connection.connect(HttpOutboundChannelConnection.java:965) ... at com.ibm.ws.batch.BatchGridDiscriminatorSoapBindingStub.purgeJob( BatchGridDiscriminatorSoapBindingStub.java:428) at com.ibm.ws.batch.BatchGridDiscriminatorProxy.purgeJob(BatchGridD iscriminatorProxy.java:137) at com.ibm.ws.batch.SchedulerSingleton.invoke61PlusEndpoint(Schedul erSingleton.java:5705) at com.ibm.ws.batch.SchedulerSingleton.invokeEndpoint(SchedulerSing leton.java:5592) at com.ibm.ws.batch.SchedulerSingleton.endpointPurge(SchedulerSingl eton.java:3538) at com.ibm.ws.batch.SchedulerSingleton.purgeJob(SchedulerSingleton. java:3483) at com.ibm.ws.batch.SchedulerSingleton.getJobNumber(SchedulerSingle ton.java:1787) at com.ibm.ws.batch.SchedulerSingleton.privateReserveJobNumberBlock (SchedulerSingleton.java:13939) at com.ibm.ws.batch.SchedulerSingleton.privateReserveJobNumberStrin gBlock(SchedulerSingleton.java:13969) at com.ibm.ws.batch.JobSchedulerBean.reserveJobNumberBlock(JobSched ulerBean.java:346)
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere Compute Grid * * Version 8. * **************************************************************** * PROBLEM DESCRIPTION: Jobs remain in executing state * * indefinitely when job numbers have * * wrapped and need to be reclaimed. * **************************************************************** * RECOMMENDATION: * **************************************************************** The logs show numerous WSWS3713E: Connection refused messages when trying to do a purge on the endpoint as part of the purgeJob processing when reclaiming job numbers. When this occurs, the code was not handling it correctly and the job was getting purged from the job status table, but not from the global job id assignment table. Therefore, when an attempt was made to reclaim the job number, no removable jobs can be found in the job status store resulting in a NullPointerException.
Problem conclusion
For jobs in the ended, cancelled or execution failed state, the information has already been cleaned up on the endpoint, so there is no need to issue the purgeEndpoint call which was failing because of the connection error. The fix for this APAR is currently targeted for inclusion in fixpack 8.0.0.3. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?uid=swg27022998
Temporary fix
Comments
APAR Information
APAR number
PM90474
Reported component name
WXD COMPUTE GRI
Reported component ID
5725C9301
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-06-05
Closed date
2013-08-13
Last modified date
2013-08-13
APAR is sysrouted FROM one or more of the following:
PM78641
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WXD COMPUTE GRI
Fixed component ID
5725C9301
Applicable component levels
R800 PSY
UP
Document Information
Modified date:
28 April 2022