Tutorial#

You can use enjoy-slurm to submit and manage Slurm jobs in python.

NOTE: This tutorials was run at the DKRZ Levante. You will have to adapt your partition names and, of course, account if you want to run the tutorial somewhere else.

Let’s assume you have a bash test.sh:

#!/bin/sh
echo "Hello World from $(hostname)"

You can submit this using sbatch. Afterwards, we will immediately retrieve some information using scontrol. Note, that scontrol.show usually only works as long as the job is not completed yet.

[1]:

import enjoy_slurm as slurm

jobid = slurm.sbatch("test.sh", account="ch0636", partition="shared")
jobinfo = slurm.scontrol.show(jobid=jobid)
jobid

[1]:

Now you can check the state of your job using sacct:

[2]:

slurm.sacct(jobid)

[2]:

	JobID	Elapsed	NCPUS	NTasks	State	End	JobName
0	4248273	00:00:00	1	NaN	PENDING	Unknown	test.sh

Let’s have a look at the job info while the job is pending

[3]:

jobinfo[str(jobid)].keys()

[3]:

dict_keys(['JobId', 'JobName', 'UserId', 'GroupId', 'MCS_label', 'Priority', 'Nice', 'Account', 'QOS', 'JobState', 'Reason', 'Dependency', 'Requeue', 'Restarts', 'BatchFlag', 'Reboot', 'ExitCode', 'RunTime', 'TimeLimit', 'TimeMin', 'SubmitTime', 'EligibleTime', 'AccrueTime', 'StartTime', 'EndTime', 'Deadline', 'SuspendTime', 'SecsPreSuspend', 'LastSchedEval', 'Partition', 'AllocNode:Sid', 'ReqNodeList', 'ExcNodeList', 'NodeList', 'NumNodes', 'NumCPUs', 'NumTasks', 'CPUs/Task', 'ReqB:S:C:T', 'TRES', 'Socks/Node', 'NtasksPerN:B:S:C', 'CoreSpec', 'MinCPUsNode', 'MinMemoryCPU', 'MinTmpDiskNode', 'Features', 'DelayBoot', 'OverSubscribe', 'Contiguous', 'Licenses', 'Network', 'Command', 'WorkDir', 'StdErr', 'StdIn', 'StdOut', 'Power'])

Meanwhile the job should have completed:

[4]:

slurm.sacct(jobid)

[4]:

	JobID	Elapsed	NCPUS	NTasks	State	End	JobName
0	4248273	00:00:15	2	NaN	COMPLETED	2023-03-15T10:19:10	test.sh
1	4248273.batch	00:00:15	2	1.0	COMPLETED	2023-03-15T10:19:10	batch
2	4248273.extern	00:00:15	2	1.0	COMPLETED	2023-03-15T10:19:10	extern

Let’s check the logfile content

[5]:

def get_log(logfile):
    with open(logfile) as f:
        log = f.read().splitlines()[0]
    return log


logfile = jobinfo[str(jobid)].get("StdOut")
get_log(logfile)

[5]:

'Hello World from l40000.lvt.dkrz.de'

enjoy-slurm becomes more useful if you want to manage more jobs which becomes easy in python, e.g.

[13]:

jobinfo = {}

for i in range(0, 10):
    jobid = slurm.sbatch("test.sh", account="ch0636", partition="shared")
    jobinfo[jobid] = slurm.scontrol.show(jobid=jobid)[str(jobid)]

Check the accounting:

[14]:

slurm.sacct(name="test.sh", state="PENDING")

[14]:

	JobID	JobName	Partition	Account	AllocCPUS	State	ExitCode
0	4248312	test.sh	shared	ch0636	1	PENDING	0:0
1	4248313	test.sh	shared	ch0636	1	PENDING	0:0
2	4248314	test.sh	shared	ch0636	1	PENDING	0:0
3	4248315	test.sh	shared	ch0636	1	PENDING	0:0
4	4248316	test.sh	shared	ch0636	1	PENDING	0:0
5	4248317	test.sh	shared	ch0636	1	PENDING	0:0
6	4248318	test.sh	shared	ch0636	1	PENDING	0:0
7	4248319	test.sh	shared	ch0636	1	PENDING	0:0
8	4248320	test.sh	shared	ch0636	1	PENDING	0:0
9	4248321	test.sh	shared	ch0636	1	PENDING	0:0

[15]:

jobinfo.keys()

[15]:

dict_keys([4248312, 4248313, 4248314, 4248315, 4248316, 4248317, 4248318, 4248319, 4248320, 4248321])

And finally, let’s print the log contents

[16]:

logs = {}

for jobid, info in jobinfo.items():
    logs[jobid] = get_log(info.get("StdOut"))

[17]:

logs

[17]:

{4248312: 'Hello World from l40000.lvt.dkrz.de',
 4248313: 'Hello World from l40000.lvt.dkrz.de',
 4248314: 'Hello World from l40000.lvt.dkrz.de',
 4248315: 'Hello World from l40000.lvt.dkrz.de',
 4248316: 'Hello World from l40000.lvt.dkrz.de',
 4248317: 'Hello World from l40000.lvt.dkrz.de',
 4248318: 'Hello World from l40000.lvt.dkrz.de',
 4248319: 'Hello World from l40000.lvt.dkrz.de',
 4248320: 'Hello World from l40000.lvt.dkrz.de',
 4248321: 'Hello World from l40000.lvt.dkrz.de'}