Scheduler Admin API
Checking the supervisor state
Request
GET /api/supervisor/state
Returns either service unavailable if no schedule is configured or one of RUNNING, HALTED or FINISHED
Stopping the supervisor thread
Request
POST /api/supervisor/halt
stops a running scheduler. The request returns immediately, but the scheduler thread itself will block any running job.
Restarting a supervisor thread
Request
POST /api/supervisor/restart
attempts to restart a scheduler thread that was previously halted or runtimed. If it runtimed, it might be best to determine the cause of the runtime before attempting a restart.
DB cleanup for hanging jobs or in case of DuplicateKeyException
If, upon completion of a scheduled job, the connection to the database is lost, a job may hang. This should generate an alert.
If the cleanup process is not completed correctly, a DuplicateKeyException may be triggered. To avoid this, ensure the cleanup process is completed correctly and restart the supervisor thread.
You have two options: either delete from JOB_RESULTS and PARTIAL_JOB_RESULTS so that you bring the database to the state before the job started, or update SCHEDULED_JOBS and PARTIAL_JOB_RESULTS and bring the database to a state where the hanging job is 'completed' and rescheduled.
Deletion is your option if you need the job to run with a specific value of scheduled_start. Furthermore, this is the most logical option for a remote job, given that it comprises multiple partial jobs. Depending on when the issue occurred, you may have lost the cluster identification.
So
- Halt the supervisor thread
- If the job does NOT need to run again with the original
scheduledStartvalue, first increase thescheduled_startvalue of the job causing the problem in tableSCHEDULED_JOBSthen update the 'running' partial job - if you NEED to rerun the job with the original value of
scheduled_start: delete the record with the problematic unique key in tableJOB_RESULTStogether with its child records inPARTIAL_JOB_RESULTS - Restart the supervisor thread.
update SCHEDULED_JOBS set scheduled_start = '2015-03-11 00:30:00' where id = 2
update PARTIAL_JOB_RESULTS set state = 'completed', duration = 1, result_code = 'ok' where {thePartialJobId}
or
select id from JOB_RESULTS where scheduled_start = '2015-03-11 00:28:00' and job_id = 2
delete from PARTIAL_JOB_RESULTS where job_result_id = {theId}
delete from JOB_RESULTS where id = {theId}
When that's done, you can restart the supervisor thread.
No Comments