Open Issues
Can’t Read in Denormalized Numbers
Fortran programs will crash (unless using the IOSTAT option) on a read of denormalized numbers (e.g., 1.0E-315). The hardware on a Cray X1E does not support denormals. However, it is possible to write out denormalized numbers, which can happen when a program writes out uninitialized data. So a situation can occur in which the X1 can’t read in data it wrote out. At this time, the solution is to make sure you initialize all arrays that are output.
This is SPR #733088 with Cray. It will likely be fixed by making the I/O system read in the denormals as zeros.
Can’t Find Fortran Modules when Loading HDF5 or NetCDF Modules
Although Phoenix has HDF5 and NetCDF modules, the ftn compiler cannot find the Fortran modules (.mod files) included with those libraries. This is not a problem if you are not useing a Fortran module.
The only workaround at this time is to add the -em -p<path> option to your command line, where <path> depends on the module. Do a module display <module> to see information about the module you are using—and the path in particular.
Despite this issue, loading these modules is encouraged if you need to use these libraries because the compiler will automatically link against the appropriate SSP or MSP libraries depending on the compilation options (such as -Ossp or -h ssp). Modules also set appropriate environment variables.
Added March 24, 2005
Large Procedure-Call Overhead
Phoenix can have a relatively large procedure-call overhead. If your code has a procedure call inside a loop, you could witness a significant performance penalty.
The best way to deal with this is to have the compiler inline the procedure, if possible, or inline it yourself. Alternately, you can push the outer loop down into the procedure call.
Added October 28, 2003
MPI I/O Bug with external32 Format
The external32 format in MPI I/O does not work properly on Phoenix. The workaround is to use the native format.
Added October 28, 2003
Updated October 10, 2006: This problem will not be corrected on the X1, so please use the workaround.
Bus Errors
Occasionally, run-time bus errors are generated after a code was recompiled. The error can be corrected by removing the current executable and relinking or recompiling the code. The error can be avoided by always removing the existing executable before recompiling or relinking the code.
Resolved Issues
When using a recursive rm command from the scratch directory on Robin, the following error message is returned:
$ mkdir test; touch test/1; rm -rf test rm: reading directory `test/.': Unknown error 525
The file in the test directory is removed, but the directory isn’t. It can be removed by a second rm -rf command or the rmdir command. This issue is currently being investigated. Note that this problem does not occur in home directories.
Updated October 10, 2006: This problem was fixed in UNICOS/mp 3.1.11 release.
mppe Usage xxx Exceeded Limit yyy (case 7)
For a time period roughly including November 2005 through January 2006, various X1E users will have seen an error message such as the following:
=>> PBS: job killed: mppe usage 384 exceeded limit 192 (case 7) Terminated
The usage number is always shown to be two times the limit. We believe there is some new problem caused by an unknown interaction between the kernel and PBS. This seems to happen randomly and subsequent resubmission of the job usually works fine.
As of February 1, 2006, a potential fix was put in place on the X1E (Phoenix). If you see this message now, please contact help@nccs.gov with the error message, PBS job id, and approximate time it happened.
Updated February 3, 2006
Interactive Run Dies Mysteriously
There is a 10-minute time limit on interactive runs on application processors. Before, when this limit was reached, there was no error message to indicate the run had been killed. Instead, the result was error messages such as segmentation fault or memory fault, which typically imply there is a problem with the code.
This is now fixed, and reaching the limit produces an error such as the following:
./a.out.[13]: 101087 Exceeded CPU time limit
But note that this message is mixed in with other messages such as segmentation fault or memory fault or even with a stack trace (if you set TRACEBK).
If you need more time than 10 minutes, use interactive batch. For example,
qsub -I -lwalltime=2:00:00,mppe=<n> ,
where <n> is the number of MSPs.
Added November 18, 2004; resolved March 2005
CC -hlist Fails with Long Path Names
Versions 5.3.0.1 and 5.2.0.* of the Cray C compiler may fail when using -hlist and very long path names.
The problem was fixed in version 5.3.0.2 and after. It can be worked around with the older compilers by not using -hlist.
Added February 18, 2005
Multistreaming with PrgEnv 5.2
Version 5.2 of the Fortran compiler does not always produce correct multistreaming code. Symptoms include incorrect answers and MSYNC errors. The problem appears in versions 5.2.0.1 to 5.2.0.5. Version 5.2.0.6 appears to be better.
Added September 13, 2004; updated February 17, 2005
PrgEnv 5.1 Fails in 4 TB Scratch File System
PrgEnv 5.1 cannot deal with the large inode numbers associated with large file systems, such as the multi-TB scratch area. This may prevent you from being able to compile or link your code, especially if you have more than one path from which to include files. It may also affect pat_build and pat_report.
This problem was fixed in PrgEnv 5.2.
Added February 16, 2004; updated May 4, 2004
-Omodinline Dumps Core
There are multiple reported cases of core dumps when the -Omodinline compiler option is used. The only workaround at this time is to reduce the optimization to -Oinline0.
This problem was fixed in PrgEnv 5.2.
Added October 28, 2003; updated May 4, 2004
