[mpich-discuss] MPI file writes fail on non-parallel filesystem

Wei-keng Liao wkliao at ece.northwestern.edu
Tue Aug 10 13:56:40 CDT 2010


I think the MPICH_MPIIO_CB_ALIGN environment variable only takes
effect on Cray's Lustre.

A few more informations can be helpful to pinpoint the problem.
1. Is there a core dump? What are the error messages?
2. the following commands can provide informations about your environment.
   "mount |grep PATH"  -- it reports the type of mounted file system PATH
   "mpicc -v" or "mpich2version" -- reports the MPICH/MVAPICH version
   "h5stat -V" or "h5dump -V" -- reports the HDF5 version

Wei-keng

On Aug 10, 2010, at 1:25 PM, Rob Ross wrote:

> If the cluster doesn't have a parallel file system, what does it have? NFS volume?
> 
> Rob
> 
> On Aug 10, 2010, at 10:49 AM, Linda Sugiyama wrote:
> 
>> 
>> We're porting a fairly large code that runs well on several hundred
>> processors on Cray XT-4/5 computers with a Lustre or equivalent file
>> system to a local cluster with a 'non-parallel' file system.
>> The code uses the PETSc MPI libraries, but writes checkpoint files
>> via standard MPI commands.
>> 
>> On our cluster, the code itself runs fine, but the
>> checkpoint write crashes for a 48 processor job,
>> with a segmentation fault.  Checkpoint writes work on 32
>> processors, although very slowly.  HDF5 file writes for
>> similar amounts of data work. (The cluster has Infiniband.)
>> We would like to run on a couple hundred processors.
>> 
>> Someone suggested setting the environment variable MPICH_MPIIO_CB_ALIGN
>> to 0 or 1 for non-lustre file systems, but it doesn't seem have any effect.
>> 
>> 
>> I seem to recall that one of the original Cray XT-4 systems
>> also had a problem with extremely slow checkpoint writes and reads
>> before the Lustre file system was installed.
>> The code has run successfully on a number of different computers,
>> but I don't know what kind of file systems they had.
>> 
>> 
>> Any suggestions?
>> The local systems people don't know much about MPI.
>> 
>> 
>> Linda Sugiyama
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 



More information about the mpich-discuss mailing list