Skip to main content

Scheduler Admin API

Checking the supervisor state

Request

GET /api/supervisor/state

Returns either service unavailable if no schedule is configured or one of RUNNING, HALTED or FINISHED

Stopping the supervisor thread

Request

POST /api/supervisor/halt

stops a running scheduler. The request returns immediately, but the scheduler thread itself will block any running job.

Restarting a supervisor thread

Request

POST /api/supervisor/restart

attempts to restart a scheduler thread that was previously halted or runtimed. If it runtimed, it might be best to determine the cause of the runtime before attempting a restart.

 

DB cleanup for hanging jobs or in case of DuplicateKeyException

If, upon completion of a scheduled job, the connection to the database is lost, a job may hang. This should generate an alert.

If the cleanup process is not completed correctly, a DuplicateKeyException may be triggered. To avoid this, ensure the cleanup process is completed correctly and restart the supervisor thread.

 

 

You have two options: either delete from JOB_RESULTS and PARTIAL_JOB_RESULTS so that you bring the database to the state before the job started, or update SCHEDULED_JOBS and PARTIAL_JOB_RESULTS and bring the database to a state where the hanging job is 'completed' and rescheduled.

Deletion is your option if you need the job to run with a specific value of scheduled_start. Furthermore, this is the most logical option for a remote job, given that it comprises multiple partial jobs. Depending on when the issue occurred, you may have lost the cluster identification.

So

  • Halt the supervisor thread
  • If the job does NOT need to run again with the original scheduledStart value, first increase the scheduled_start value of the job causing the problem in table SCHEDULED_JOBS then update the 'running' partial job
  • if you NEED to rerun the job with the original value of scheduled_start: delete the record with the problematic unique key in table JOB_RESULTS together with its child records in PARTIAL_JOB_RESULTS
  • Restart the supervisor thread.

 

update SCHEDULED_JOBS set scheduled_start = '2015-03-11 00:30:00' where id = 2
update PARTIAL_JOB_RESULTS set state = 'completed', duration = 1, result_code = 'ok' where {thePartialJobId}

or

select id from JOB_RESULTS where scheduled_start = '2015-03-11 00:28:00' and job_id = 2
delete from PARTIAL_JOB_RESULTS where job_result_id = {theId}
delete from JOB_RESULTS where id = {theId}

When that's done, you can restart the supervisor thread.