TotalView

Description

The TotalView debugger is a tool that lets you debug, analyze, and tune the performance of complex serial, multiprocessor, and multithreaded programs.

TotalView can be executed two ways:

  1. as a GUI, totalview and

  2. from command line interface, totalviewcli.

Because the TotalView GUI is an X-Window application, your system must be set up to allow and run X11 traffic. It is best to tunnel the X11 traffic through the SSH connection. To do this, your SSH connection needs to allow X11 forwarding. This can be accomplished by adding the line ForwardX11 yes to your SSH config file or by using the -X option.

For more information please see the following:

  • TotalView documentation page

  • User guides in /apps/toolworks/totalview/doc/pdf on each system.

Use

Jaguar
On Jaguar, TotalView is available through the TotalView modules. A TotalView module should be loaded for each user by default upon login.
If you are debugging an existing core file and do not need to run a parallel process, you can launch TotalView with the following command:

$ totalview a.out core

If you want to use TotalView to start your job and monitor it as it runs, you must take some additional steps. Because parallel jobs cannot be launched directly from the login nodes, you will need to launch TotalView from within a batch job. The easiest way to do this is to start an interactive batch job as follows:

$ qsub -l size=16,walltime=1:00:00 -I -V -A project_identifier

Make sure to use -V as an option to qsub because it will import your current environment (including the DISPLAY variable for the X-Window connection).

Once your job starts, you can launch TotalView from the command line. The following example starts TotalView on 16 compute cores:

$ totalview aprun -a -n 16 a.out

After you run the totalview command, two windows should appear. In the larger window, you will see the assembly code for aprun. Type “G” (capital “G”) in this window to cause all processes to “Go.” TotalView will run for a few seconds and then ask if you’d like to stop your processes before entering “MAIN.” Answer “yes” to stop your program at the beginning so you can add breakpoints, etc., before running.

Connecting to a running job can be done from the job’s aprun node as follows:

  • Determine the job’s aprun node using qstat -f <jobid> | grep exec_host.

  • Log into the aprun from a Jaguar login node using ssh -X aprun##.

  • From the aprun node, start TotalView and connect to the aprun process.

Phoenix
On Phoenix, TotalView is available through the TotalView modules. A TotalView module should be loaded for each user by default upon login.

NOTE: The TotalView Release Overview, Installation Guide, and User’s Guide Addendum for Cray X1 Systems manual Section 1.5, “Limitations,” states the following: “TotalView currently does not use aprun to launch the executable to be debugged, and it explicitly disables the auto-aprun feature. As a consequence, an application launched from the TotalView command line will be placed on a system node, not an application node, and it will share the node’s processors with operating system processes. As a workaround you may use aprun to launch the application first, then use TotalView to attach to the process after it is running.”

The following instructions can be used to debug an MPI task on Phoenix:

  1. Submit a job using PBS, the same way you have been doing so all the time.

  2. Use psview to identify the APID and the job executable name after you see it’s running.

  3. Type “totalview” and a TotalView window will pop up.

  4. Go to “file,” and click on “new program.” Another window will pop up.

  5. Put in the name of the job name shown in psview in the first box.

  6. Put in the APID in the process ID box.

  7. Click “local” in the third box.

  8. Click “OK,” and the debugging session will start.

The following can be used to run a parallel interactive job with TotalView on Phoenix (replace <nprocs> with the number of processors you would like to use):

env CRAY_AUTO_APRUN_OPTIONS="-n<nprocs>" totalview <program> [totalviewoptions] [-a <program options>]