Skip to content

Job management commands

  • sacct: inventory data for completed and pending jobs (sacct -j )
  • sstat: statistics of the jobs being performed (sstat -j --format = AveCPU, AveRSS, AveVMSize, MaxRSS, MaxVMSize)
  • scontrol show: e.g. scontrol show job | partition
  • scontrol update: change the transaction
  • scontrol hold: pause the job
  • scontrol release: release the job
  • sprio: displays job priority
  • scancel: cancel the job

Monitoting jobs

List all current jobs for a user:

squeue -u <username>

The output of the squeue command consists of several columns including job ID, partition, job name, username, job state, elapsed time, number of nodes, node list, etc.

  JOBID  PARTITION    NAME     USER  ST       TIME  NODES NODE LIST (REASON)
  499980   longcpu  vega208t   user  PD       0:00      1         (Resources)
  499981   longcpu  vega192t   user  PD       0:00      1         (Priority)
  449911   longcpu  bxe_t280   user  R   1-01:23:39     1 cn0402
  499889   longcpu  vega256t   user  R      3:29:24     1 cn0011
  449133   longcpu  bxe_t240   user  R   1-03:38:21     1 cn0401

Job state is listed in the ST column of the output of the squeue command. The most common job state codes are:

  • R: Running
  • PD: Pending
  • CG: Completing
  • CA: Cancelled

List all running jobs for a user:

squeue -u <username> -t RUNNING

List all pending jobs for a user:

squeue -u <username> -t PENDING

List all current jobs in the shared partition for a user:

squeue -u <username> -p shared

List detailed information for a job (useful for troubleshooting):

scontrol show jobid -dd <jobid>

List status info for a currently running job:

sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.

To get statistics on completed jobs by jobID:

sacct -j <jobid> --format=jobID,JobName%20,NNodes,NTasks,NCPUS,MaxRSS,AveRSS,Elapsed,ExitCode

To view the same information for all jobs of a user:

sacct -u <username> --format=jobID,JobName%20,NNodes,NTasks,NCPUS,MaxRSS,AveRSS,Elapsed,ExitCode

Cancelling jobs

For various reasons, you might want to terminate your running jobs or remove your waiting jobs from the queue. The command is scancel. Read "man scancel" documentation for more information. Run the straightforward command to kill two of your jobs, by giving their job number.

$ scancel <Job ID> <Job ID>

The following command

$ scancel -i -u your_account_name

kills all your jobs, but asks for each job if you really want to terminate that job.

$ scancel -u your_account_name --state=pending

terminates all your waiting jobs.

To hold a particular job from being scheduled:

scontrol hold <jobid>

To release a particular job to be scheduled:

scontrol release <jobid>

To requeue (cancel and rerun) a particular job:

scontrol requeue <jobid>