Open Issues

Can’t Read in Denormalized Numbers

Fortran programs will crash (unless using the IOSTAT option) on a read of denormalized numbers (e.g., 1.0E-315). The hardware on a Cray X1E does not support denormals. However, it is possible to write out denormalized numbers, which can happen when a program writes out uninitialized data. So a situation can occur in which the X1 can’t read in data it wrote out. At this time, the solution is to make sure you initialize all arrays that are output.

This is SPR #733088 with Cray. It will likely be fixed by making the I/O system read in the denormals as zeros.

Can’t Find Fortran Modules when Loading HDF5 or NetCDF Modules

Although Phoenix has HDF5 and NetCDF modules, the ftn compiler cannot find the Fortran modules (.mod files) included with those libraries. This is not a problem if you are not useing a Fortran module.

The only workaround at this time is to add the -em -p<path> option to your command line, where <path> depends on the module. Do a module display <module> to see information about the module you are using—and the path in particular.

Despite this issue, loading these modules is encouraged if you need to use these libraries because the compiler will automatically link against the appropriate SSP or MSP libraries depending on the compilation options (such as -Ossp or -h ssp). Modules also set appropriate environment variables.

Added March 24, 2005

Large Procedure-Call Overhead

Phoenix can have a relatively large procedure-call overhead. If your code has a procedure call inside a loop, you could witness a significant performance penalty.

The best way to deal with this is to have the compiler inline the procedure, if possible, or inline it yourself. Alternately, you can push the outer loop down into the procedure call.

Added October 28, 2003

MPI I/O Bug with external32 Format

The external32 format in MPI I/O does not work properly on Phoenix. The workaround is to use the native format.

Added October 28, 2003

Updated October 10, 2006: This problem will not be corrected on the X1, so please use the workaround.

Bus Errors

Occasionally, run-time bus errors are generated after a code was recompiled. The error can be corrected by removing the current executable and relinking or recompiling the code. The error can be avoided by always removing the existing executable before recompiling or relinking the code.

Resolved Issues

When using a recursive rm command from the scratch directory on Robin, the following error message is returned:

$ mkdir test; touch test/1; rm -rf test
rm: reading directory `test/.': Unknown error 525

The file in the test directory is removed, but the directory isn’t. It can be removed by a second rm -rf command or the rmdir command. This issue is currently being investigated. Note that this problem does not occur in home directories.

Updated October 10, 2006: This problem was fixed in UNICOS/mp 3.1.11 release.

mppe Usage xxx Exceeded Limit yyy (case 7)

For a time period roughly including November 2005 through January 2006, various X1E users will have seen an error message such as the following:

=>> PBS: job killed: mppe usage 384 exceeded limit 192 (case 7)
Terminated

The usage number is always shown to be two times the limit. We believe there is some new problem caused by an unknown interaction between the kernel and PBS. This seems to happen randomly and subsequent resubmission of the job usually works fine.

As of February 1, 2006, a potential fix was put in place on the X1E (Phoenix). If you see this message now, please contact help@nccs.gov with the error message, PBS job id, and approximate time it happened.

Updated February 3, 2006

Interactive Run Dies Mysteriously

There is a 10-minute time limit on interactive runs on application processors. Before, when this limit was reached, there was no error message to indicate the run had been killed. Instead, the result was error messages such as segmentation fault or memory fault, which typically imply there is a problem with the code.

This is now fixed, and reaching the limit produces an error such as the following:

./a.out.[13]: 101087 Exceeded CPU time limit

But note that this message is mixed in with other messages such as segmentation fault or memory fault or even with a stack trace (if you set TRACEBK).

If you need more time than 10 minutes, use interactive batch. For example,

qsub -I -lwalltime=2:00:00,mppe=<n> ,

where <n> is the number of MSPs.

Added November 18, 2004; resolved March 2005

CC -hlist Fails with Long Path Names

Versions 5.3.0.1 and 5.2.0.* of the Cray C compiler may fail when using -hlist and very long path names.

The problem was fixed in version 5.3.0.2 and after. It can be worked around with the older compilers by not using -hlist.

Added February 18, 2005

Multistreaming with PrgEnv 5.2

Version 5.2 of the Fortran compiler does not always produce correct multistreaming code. Symptoms include incorrect answers and MSYNC errors. The problem appears in versions 5.2.0.1 to 5.2.0.5. Version 5.2.0.6 appears to be better.

Added September 13, 2004; updated February 17, 2005

PrgEnv 5.1 Fails in 4 TB Scratch File System

PrgEnv 5.1 cannot deal with the large inode numbers associated with large file systems, such as the multi-TB scratch area. This may prevent you from being able to compile or link your code, especially if you have more than one path from which to include files. It may also affect pat_build and pat_report.

This problem was fixed in PrgEnv 5.2.

Added February 16, 2004; updated May 4, 2004

-Omodinline Dumps Core

There are multiple reported cases of core dumps when the -Omodinline compiler option is used. The only workaround at this time is to reduce the optimization to -Oinline0.

This problem was fixed in PrgEnv 5.2.

Added October 28, 2003; updated May 4, 2004