Job Execution

Once resources have been allocated through PBS, users have the option of serially running commands on the allocated resources’ head node or across all the resources in the allocated resource pool.




Serial

Batch Script

The executable portion of batch scripts is interpreted by the shell specified on the first line of the script. If a shell is not specified, the submitting user’s default shell will be used. This portion of the script may contain comments, shell commands, executable scripts, and compiled executables. These can be used in combination to, for example, navigate file systems, set up job execution, run executables, and even submit other batch jobs.

Batch Interactive

While running in interactive mode, the submitting user’s default shell will be used.





Parallel

By default, commands will executed on the head node. The mpirun command is used to execute a job on one or more compute nodes. Lens’s layout should be kept in mind when running a job using mpirun. Lens’s current layout consists of four quad-core sockets per node. The PBS node option requests compute nodes. The PBS ppn option requests cores.

mpiprun accepts the following common options:

-display-map Can be used to view layout
-npernode Number of cores per node
-n Number of total cores

Note: If you do not specify the number of tasks to mpirun, the system will default to all available cores.

Task Layout

The default MPI task layout on the system is to round-robin task on cores.

For example if 2 nodes with 2 cores on each are requested with
#PBS -lnodes=2:ppn=2

Round-robin by core

the following

mpirun -n 4 a.out

will run the MPI executable a.out on a total of four cores, two cores on two compute nodes. The MPI tasks will be allocated in the following round-robin by core fashion:

Compute Node 0 Compute Node 1
core 0 core 1 core 0 core 1
0 1 2 3

Round-robin by node

the following

mpirun -n 4 -bynode a.out

will run the MPI executable a.out on a total of four cores, two cores on two compute nodes. The MPI tasks will be allocated in the following round-robin by node fashion:

Compute Node 0 Compute Node 1
core 0 core 1 core 0 core 1
0 2 1 3

Core Affinity

The system may move tasks between cores within a node. If your job has not been allocated an entire node, it may share resources with other jobs running on the node. The only way to prevent this is to request the entire node.

Please note: The only way to ensure that your job does not share resources with another job(s) on the node is to request the entire node.

mpi_yield_when_idle

To help prevent a job’s tasks from being moved between cores each idle cycle the mpi_yield_when_idle OpenMPI option may be used. For example:
mpirun -n 8 -mca mpi_yield_when_idle 0 a.out

This will help prevent the core from being given to other waiting tasks. This only affects MPI processes when they are blocking in MPI library calls.

By default OpenMPI will set this variable based on whether it believes the node is over allocated or under allocated. If over allocated, mpi_yield_when_idle, will be set to a value other than one allowing the core to be given to other waiting tasks when idle. If under allocated, mpi_yield_when_idle, will be set to zero.

If more tasks are running on a node than are cores, the OS will swap all tasks between cores on the node. The mpi_yield_when_idle option only helps to slow this down; it will not fully prevent the swaps.