Job management commands
- sacct: inventory data for completed and pending jobs (sacct -j
- sstat: statistics of the jobs being performed (sstat -j
--format = AveCPU, AveRSS, AveVMSize, MaxRSS, MaxVMSize)
- scontrol show: e.g. scontrol show job | partition
- scontrol update: change the transaction
- scontrol hold: pause the job
- scontrol release: release the job
- sprio: displays job priority
- scancel: cancel the job
For various reasons, you might want to terminate your running jobs or remove your waiting jobs from the queue. The command is scancel. Read "man scancel" documentation for more information. Run the straightforward command to kill two of your jobs, by giving their job number.
$ scancel <Job ID> <Job ID>
The following command
$ scancel -i -u your_account_name
kills all your jobs, but asks for each job if you really want to terminate that job.
$ scancel -u your_account_name --state=pending
terminates all your waiting jobs.
To see the status of your program, you can run commands like:
- jobinfo -u your_account_name
For example, to monitor the state of your jobs with squeue before they are finished:
squeue -u (username)
The output of the squeue command consists of several columns including job ID, partition, job name, username, job state, elapsed time, number of nodes, node list, etc.
JOBID PARTITION NAME USER ST TIME NODES NODE LIST (REASON) 499980 longcpu vega208t user PD 0:00 1 (Resources) 499981 longcpu vega192t user PD 0:00 1 (Priority) 449911 longcpu bxe_t280 user R 1-01:23:39 1 cn0402 499889 longcpu vega256t user R 3:29:24 1 cn0011 449133 longcpu bxe_t240 user R 1-03:38:21 1 cn0401
Job state is listed in the ST column of the output of the squeue command. The most common job state codes are:
- R: Running
- PD: Pending
- CG: Completing
- CA: Cancelled