[mpich-discuss] MPI file writes fail on non-parallel filesystem

Tue Aug 10 10:49:52 CDT 2010

We're porting a fairly large code that runs well on several hundred
processors on Cray XT-4/5 computers with a Lustre or equivalent file
system to a local cluster with a 'non-parallel' file system.
The code uses the PETSc MPI libraries, but writes checkpoint files
via standard MPI commands.

On our cluster, the code itself runs fine, but the
checkpoint write crashes for a 48 processor job,
with a segmentation fault.  Checkpoint writes work on 32
processors, although very slowly.  HDF5 file writes for
similar amounts of data work. (The cluster has Infiniband.)
We would like to run on a couple hundred processors.

Someone suggested setting the environment variable MPICH_MPIIO_CB_ALIGN
to 0 or 1 for non-lustre file systems, but it doesn't seem have any 
effect.

I seem to recall that one of the original Cray XT-4 systems
also had a problem with extremely slow checkpoint writes and reads
before the Lustre file system was installed.
The code has run successfully on a number of different computers,
but I don't know what kind of file systems they had.

Any suggestions?
The local systems people don't know much about MPI.

Linda Sugiyama