1. dalma-run_job_array
  2. Dalma Manual
  3. dalma-run_job_array

Dalma Man Pages (dalman)

SYNOPSIS

Ths page explains how to Running a Slurm job array.

DESCRIPTION

Slurm job array is a series of indentical jobs from one job script, with only one difference: the environmental variable $SLURM_ARRAY_TASK_ID. This variable could be used as a indexer to different task.

In the job script, one extra Slurm directive $SBATCH -a 1-<NUMBER_OF_JOBS> is required. For example, $SBATCH -a 1-18 will generate 18 indentical jobs, with the environmental variable $SLURM_ARRAY_TASK_ID varies from 1 to 18.

EXAMPLE

1.- Login Dalma.

2.- Submit the following job script with sbatch

----------------------------------------------------------

#!/bin/bash
#SBATCH -o arrayJob_%A_%a.out
#SBATCH -e arrayJob_%A_%a.err
#SBATCH -a 1-18
# 2 cores / single task job
#SBATCH -n 1
#SBATCH --cpus-per-task=2

echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID

----------------------------------------------------------

This will generate 18 will generate 18 indentical jobs, with the environmental variable $SLURM_ARRAY_TASK_ID varies from 1 to 18.

3.- If the above script is saved as run.array.slurm, the example screen output after submission is as follow.

----------------------------------------------------------

  $> sbatch run.array.slurm
  Submitted batch job 6318

----------------------------------------------------------

6318 is the job array id.

4.- Use squeue to query the job array. The example screen output is as follow.

----------------------------------------------------------

$> squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6318_1 admin arrayJob jonarbo R 0:49 1 gpu-18-12
6318_2 admin arrayJob jonarbo R 0:49 1 gpu-18-12
6318_3 admin arrayJob jonarbo R 0:49 1 gpu-18-12
6318_4 admin arrayJob jonarbo R 0:49 1 gpu-18-12
6318_5 admin arrayJob jonarbo R 0:49 1 gpu-18-12
6318_6 admin arrayJob jonarbo R 0:49 1 gpu-18-12
6318_7 admin arrayJob jonarbo R 0:49 1 gpu-18-13
6318_8 admin arrayJob jonarbo R 0:49 1 gpu-18-13
6318_9 admin arrayJob jonarbo R 0:49 1 gpu-18-13
6318_10 admin arrayJob jonarbo R 0:49 1 gpu-18-13
6318_11 admin arrayJob jonarbo R 0:49 1 gpu-18-13
6318_12 admin arrayJob jonarbo R 0:49 1 gpu-18-13
6318_13 admin arrayJob jonarbo R 0:49 1 gpu-18-14
6318_14 admin arrayJob jonarbo R 0:49 1 gpu-18-14
6318_15 admin arrayJob jonarbo R 0:49 1 gpu-18-14
6318_16 admin arrayJob jonarbo R 0:49 1 gpu-18-14
6318_17 admin arrayJob jonarbo R 0:49 1 gpu-18-14
6318_18 admin arrayJob jonarbo R 0:49 1 gpu-18-14

----------------------------------------------------------

Notice that 6318 is the job array id, while the number after _, 1-18 is the $SLURM_ARRAY_TASK_ID. One can specify job 6318_6 to handle 6th line of input.

CANCELLING JOBS FROM JOB ARRAY

In the previous example, if you want to cancel the 3rd job in job array 6318, you can run this.

----------------------------------------------------------

# Cancel job 6318_3 in job array 6318. You can get information on all your jobs by `squeue -u <your-net-id>`

scancel 6318_3

----------------------------------------------------------

But if you use the JOBID from qstat, scancel will not cancel the job.

If you want to cancel the whole job array, e.g., every jobs in the job array, here is an example.

----------------------------------------------------------

# Cancel all jobs in job array 6318. Change 6318 to your actual ${SLURM_ARRAY_JOB_ID}

scancel 6318

----------------------------------------------------------

AUTHORS

NYUAD HPC Apps Team:
--------------------
    - Benoit Marchand
    - Guowei He
    - Jorge Naranjo

SEE ALSO

Please refer to the online documentation available here

  1. NYUAD
  2. February 2017
  3. dalma-run_job_array