dalma-run_parallel_job_array
Dalma Manual
dalma-run_parallel_job_array

Dalma Man Pages (dalman)

SYNOPSIS

slurm_parallel_job_array_submit.sh [-t HH:MM:SS] [-p partition] [-C avx2|sse] [-N #cores] commands-file

The "slurm_parallel_job_array_submit.sh" tool allows the rapid submission and execution of a large number of short running jobs in an efficient and simple way.

DESCRIPTION

Several types of problems can be brolen down into small short running jobs that can run independently. For instance processing a large number of input files or performing a parametric analysis - eg varying some input parameters on a single problem to find the best fit. While you can use a regular SLURM job array to process such problems, this is by no means the most efficient method as you need to wait for each job to be scheduled independently.

The "slurm_parallel_job_array_submit.sh" tool allows for the rapid submission and execution of a large number of short running jobs in an efficient and simple way. Simply prepare a list of all jobs you wish to execute in a single file which you pass as argument to "slurm_parallel_job_array_submit.sh".

Note that the tool supports OpenMP parallel applications - see below.

For example to run a series of R scripts on the same input but varying the paramemters:

Jobs File (commands.txt)
============================
`Rscript MyCode.R 0.05 input1
Rscript MyCode.R 0.10 input1
Rscript MyCode.R 0.15 input1
Rscript MyCode.R 0.20 input1
...
Rscript MyCode.R 0.80 input1
Rscript MyCode.R 0.85 input1
Rscript MyCode.R 0.90 input1
Rscript MyCode.R 0.95 input1`

To execute
==========
`> slurm_parallel_job_array_submit.sh commands.txt`

This will automatically submit the jobs are a job array where each job array instance regroups many jobs and runs them in parallel on compute nodes. Moreover the output of each job will be concatenated in order in a single output file per node:

Output File
===========
`> cat jobs-437798_1.out
============= job 1 ============
(output job 1)
============= job 2 ============
(output job 2)
============= job 3 ============
(output job 3)
============= job 4 ============
(output job 4)
...`

> cat jobs-437798_1.err
`============= job 1 ============
(error output job 1)
============= job 2 ============
(error output job 2)
============= job 3 ============
(error output job 3)
============= job 4 ============
(error output job 4)
...`

When jobs run they inherit the environment of your login shell. So all modules that you load, and environment variables that you set, prior to calling "slurm_parallel_job_array_submit.sh" are passed on to your jobs. Note that the jobs in the commands.txt file will each be allocated 2 cores to run in OpenMP mode.

`> module purge
> module load R/3.4.1
> export SOMEVARIABLENAME="1 2 3 4"
> export OMP_NUM_THREADS=2
> slurm_parallel_job_array_submit.sh commands.txt`

This tool allows you to set a few parameters:

1) Set maximum run time :  -t HH:MM:SS

    example: allow the job array instances to run for up to 24 hours

    `> slurm_parallel_job_array_submit.sh -t 24:00:00 commands.txt`

2) Select the target partition: -p partition

    example: use the serial partition

    `> slurm_parallel_job_array_submit.sh -p serial commands.txt`

3) Select the type of node: -C type

    example: use "avx2" nodes  (other choice is "sse")

    `> slurm_parallel_job_array_submit.sh -C avx2 commands.txt`

4) Set the maximum number of nodes to use: -N nodes

    example: Allow up to 10 nodes - eg up to 10 job array instance can run concurrently

    `> slurm_parallel_job_array_submit.sh -N 10 commands.txt`

AUTHORS

NYUAD HPC Apps Team:
--------------------
    - Benoit Marchand
    - Guowei He
    - Jorge Naranjo

Dalma Man Pages (dalman)

SYNOPSIS

DESCRIPTION

AUTHORS

SEE ALSO