Interactive Batch Jobs

Batch scripts are useful for submiting a group of commands, allowing them to run through the queue, then viewing the results. It is also often useful to run a job interactively. However, users are not allowed to access compute nodes or run mpirun directly from a login session. Instead, users must use a batch-interactive PBS job. This is done by using the -I option to qsub.

Interactive Batch Example

For interactive batch jobs, PBS options are passed through qsub on the command line.

qsub -I -A XXXYYY -q debug -V -lnodes=8:ppn=2,walltime=30:00

This request will

-I
Start an interactive session
-A
Charge to the XXXYYY project
-q debug
Run in the debug queue
-V
Import the submitting user’s environment
-lnodes=8:ppn=2,walltime=30:00
Request 8 nodes with 2 processors per node (16 processors) for 30 minutes

After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, you will have an interactive prompt on the head node of your allocated resource. From here commands may be executed directly instead of through a batch script.

Using to Debug

A common use of interactive batch is debugging. Below are points that may be useful while interactively debugging a code through PBS.

Quick Turnaround

The tips below may be used to help a job run quickly rather than sit in the queue.

Choosing a Job Size

You can use the showbf command (for “show back fill”) to see resource limits that would allow your job to be immediately backfilled (and thus started) by the scheduler. For example, the snapshot below shows that a job requesting seven compute nodes would run immediately.

$ showbf
Partition     Tasks  Nodes   StartOffset      Duration       StartDate
---------     -----  -----  ------------  ------------  --------------
ALL              12      6      00:00:00      INFINITY  10:23:36_03/27

The following command would then take advantage of this window for an interactive session:

qsub -q debug -I -lnodes=6:ppn=2

See showbf –help for additional options. For more information, see the online user guide for the Moab Workload Manager.

TotalView

While debugging, it may be useful to run the TotalView debugger.

The syntax to use the TotalView on ewok is somewhat different than on other machines. First, you must load the TotalView module. This is important-in addition to setting your PATH and other variables, the TotalView module sets a function (for bash/ksh) or an alias (for csh/tcsh) to allow you to easily launch TotalView. After loading the module, launch TotalView with a command similar to:

mpirun_tv -np 16 a.out

This can only be done inside a qsub -I -V ... interactive job. The -V imports your environment, and thus X11 forwarding (if you are using it) will work.

Once the window comes up, you need to hit g (for “go”) or select “go” from the menu. Then TotalView will start running aprun. Once aprun gets to the point of spawning your processes for your code, a window should come up asking if you want to continue or stop. If you want to set breakpoints, then have TotalView stop; otherwise, continue.