Tutorial#
You can use enjoy-slurm to submit and manage Slurm jobs in python.
NOTE: This tutorials was run at the DKRZ Levante. You will have to adapt your partition names and, of course, account if you want to run the tutorial somewhere else.
Let’s assume you have a bash test.sh:
#!/bin/sh
echo "Hello World from $(hostname)"
You can submit this using sbatch. Afterwards, we will immediately retrieve some information using scontrol. Note, that scontrol.show usually only works as long as the job is not completed yet.
[1]:
import enjoy_slurm as slurm
jobid = slurm.sbatch("test.sh", account="ch0636", partition="shared")
jobinfo = slurm.scontrol.show(jobid=jobid)
jobid
[1]:
4248273
Now you can check the state of your job using sacct:
[2]:
slurm.sacct(jobid)
[2]:
| JobID | Elapsed | NCPUS | NTasks | State | End | JobName | |
|---|---|---|---|---|---|---|---|
| 0 | 4248273 | 00:00:00 | 1 | NaN | PENDING | Unknown | test.sh |
Let’s have a look at the job info while the job is pending
[3]:
jobinfo[str(jobid)].keys()
[3]:
dict_keys(['JobId', 'JobName', 'UserId', 'GroupId', 'MCS_label', 'Priority', 'Nice', 'Account', 'QOS', 'JobState', 'Reason', 'Dependency', 'Requeue', 'Restarts', 'BatchFlag', 'Reboot', 'ExitCode', 'RunTime', 'TimeLimit', 'TimeMin', 'SubmitTime', 'EligibleTime', 'AccrueTime', 'StartTime', 'EndTime', 'Deadline', 'SuspendTime', 'SecsPreSuspend', 'LastSchedEval', 'Partition', 'AllocNode:Sid', 'ReqNodeList', 'ExcNodeList', 'NodeList', 'NumNodes', 'NumCPUs', 'NumTasks', 'CPUs/Task', 'ReqB:S:C:T', 'TRES', 'Socks/Node', 'NtasksPerN:B:S:C', 'CoreSpec', 'MinCPUsNode', 'MinMemoryCPU', 'MinTmpDiskNode', 'Features', 'DelayBoot', 'OverSubscribe', 'Contiguous', 'Licenses', 'Network', 'Command', 'WorkDir', 'StdErr', 'StdIn', 'StdOut', 'Power'])
Meanwhile the job should have completed:
[4]:
slurm.sacct(jobid)
[4]:
| JobID | Elapsed | NCPUS | NTasks | State | End | JobName | |
|---|---|---|---|---|---|---|---|
| 0 | 4248273 | 00:00:15 | 2 | NaN | COMPLETED | 2023-03-15T10:19:10 | test.sh |
| 1 | 4248273.batch | 00:00:15 | 2 | 1.0 | COMPLETED | 2023-03-15T10:19:10 | batch |
| 2 | 4248273.extern | 00:00:15 | 2 | 1.0 | COMPLETED | 2023-03-15T10:19:10 | extern |
Let’s check the logfile content
[5]:
def get_log(logfile):
with open(logfile) as f:
log = f.read().splitlines()[0]
return log
logfile = jobinfo[str(jobid)].get("StdOut")
get_log(logfile)
[5]:
'Hello World from l40000.lvt.dkrz.de'
enjoy-slurm becomes more useful if you want to manage more jobs which becomes easy in python, e.g.
[13]:
jobinfo = {}
for i in range(0, 10):
jobid = slurm.sbatch("test.sh", account="ch0636", partition="shared")
jobinfo[jobid] = slurm.scontrol.show(jobid=jobid)[str(jobid)]
Check the accounting:
[14]:
slurm.sacct(name="test.sh", state="PENDING")
[14]:
| JobID | JobName | Partition | Account | AllocCPUS | State | ExitCode | |
|---|---|---|---|---|---|---|---|
| 0 | 4248312 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 1 | 4248313 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 2 | 4248314 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 3 | 4248315 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 4 | 4248316 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 5 | 4248317 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 6 | 4248318 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 7 | 4248319 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 8 | 4248320 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
| 9 | 4248321 | test.sh | shared | ch0636 | 1 | PENDING | 0:0 |
[15]:
jobinfo.keys()
[15]:
dict_keys([4248312, 4248313, 4248314, 4248315, 4248316, 4248317, 4248318, 4248319, 4248320, 4248321])
And finally, let’s print the log contents
[16]:
logs = {}
for jobid, info in jobinfo.items():
logs[jobid] = get_log(info.get("StdOut"))
[17]:
logs
[17]:
{4248312: 'Hello World from l40000.lvt.dkrz.de',
4248313: 'Hello World from l40000.lvt.dkrz.de',
4248314: 'Hello World from l40000.lvt.dkrz.de',
4248315: 'Hello World from l40000.lvt.dkrz.de',
4248316: 'Hello World from l40000.lvt.dkrz.de',
4248317: 'Hello World from l40000.lvt.dkrz.de',
4248318: 'Hello World from l40000.lvt.dkrz.de',
4248319: 'Hello World from l40000.lvt.dkrz.de',
4248320: 'Hello World from l40000.lvt.dkrz.de',
4248321: 'Hello World from l40000.lvt.dkrz.de'}