Job Execution

Once resources have been allocated through PBS, users have the option of serially running commands on the allocated resources’ head node or across all the resources in the allocated resource pool.

Serial

Batch Script

The executable portion of batch scripts is interpreted by the shell specified on the first line of the script. If a shell is not specified, the submitting user’s default shell will be used. This portion of the script may contain comments, shell commands, executable scripts, and compiled executables. These can be used in combination to, for example, navigate file systems, set up job execution, run executables, and even submit other batch jobs.

Batch Interactive

While running in interactive mode, the submitting user’s default shell will be used.

Parallel

By default, commands will executed on the head node. The mpirun command is used to execute a job on one or more compute nodes.

Mpirun accepts the following common options:

-t Test mode. The system will print what would be executed given the mpirun command, but will not execute anything.
-np Number of processors (MPI tasks)
-machinefile Sequential list of nodes on which to place tasks. Task 0 will be placed on the first node in this list, task 1 on the second, and so on. This option is only required if you want to specify a non-default layout. See the Task Layout section below for more information

OpenMP

OpenMP is supported within a node. Threads cannot span across node. You will also need to modify the task layout to run a hybrid MPI/OpenMP code (this is done with the second line below and is explained further in the Task Layout section.
To run a code with 64 MPI tasks and two threads per task, you would submit a job requesting #PBS -lnodes=64:ppn=2 and use the following commands in your batch job:

export OMP_NUM_THREADS=2
cat $PBS_NODEFILE|sort|uniq >machinefile.tmp
mpirun -machinefile ./machinefile.tmp -np 64 ./a.out

NOTE: csh/tcsh users should replace the first line with

setenv OMP_NUM_THREADS 2

This mpirun command specifies 64 tasks (-np 64). The second line creates a machinefile with cause mpirun to place one task per node (see Task Layout below for more information). The OMP_NUM_THREADS environment variable tells the system to spawn two threads per MPI task.

NOTE: To use threads under PGI, the -mp=nonuma option must be added to the compile line.

Task Layout

When you start a job, the batch system creates a file that is referenced by the environment variable $PBS_NODEFILE. This file is a list of hostnames (one per line). MPI will assign tasks sequentially to this list. Task 0 will be placed on the first node in the list, task 1 on the second, and so on. If you examine this file, you will node that each hostname is listed twice (assuming you requested ppn=2). This is because each hostname represents a node, and there are two processors per node. So, listing each hostname twice tells MPI to place two tasks on each host/node.

So, if you requested #PBS -lnodes=3:ppn=2, your hostfile might contain:

ewok020
ewok020
ewok021
ewok021
ewok022
ewok022

Given that, the MPI task layout will be similar to:

  ewok020 ewok021 ewok022
  CPU 0 CPU 1 CPU 0 CPU 1 CPU 0 CPU 1
MPI Task 0 1 2 3 4 5

One reason to change the layout would be to run a hybrid MPI/OpenMP application in which you want one MPI task and two OpenMP threads per node. In that case, you’d still need to request #PBS -lnodes=3:ppn=2. If you were to use the default layout, however, you would not end up with the desired MPI/OpenMP layout. So, you need to change the machine file. Since the job requests two processors per node, you know that each node is duplicated in the $PBS_NODEFILE. You can create a file with each host listed only once with the UNIX sort and uniq commands by running cat $PBS_NODEFILE|sort|uniq > machinefile.tmp. If you run that command, and then start mpirun -np 3 -machinefile ./machinefile.tmp ./a.out, your task layout would be:

  ewok020 ewok021 ewok022
  CPU 0 CPU 1 CPU 0 CPU 1 CPU 0 CPU 1
MPI Task 0   1   2