[mpich-discuss] MPI file writes fail on non-parallel filesystem
Wei-keng Liao
wkliao at ece.northwestern.edu
Tue Aug 10 13:56:40 CDT 2010
I think the MPICH_MPIIO_CB_ALIGN environment variable only takes
effect on Cray's Lustre.
A few more informations can be helpful to pinpoint the problem.
1. Is there a core dump? What are the error messages?
2. the following commands can provide informations about your environment.
"mount |grep PATH" -- it reports the type of mounted file system PATH
"mpicc -v" or "mpich2version" -- reports the MPICH/MVAPICH version
"h5stat -V" or "h5dump -V" -- reports the HDF5 version
Wei-keng
On Aug 10, 2010, at 1:25 PM, Rob Ross wrote:
> If the cluster doesn't have a parallel file system, what does it have? NFS volume?
>
> Rob
>
> On Aug 10, 2010, at 10:49 AM, Linda Sugiyama wrote:
>
>>
>> We're porting a fairly large code that runs well on several hundred
>> processors on Cray XT-4/5 computers with a Lustre or equivalent file
>> system to a local cluster with a 'non-parallel' file system.
>> The code uses the PETSc MPI libraries, but writes checkpoint files
>> via standard MPI commands.
>>
>> On our cluster, the code itself runs fine, but the
>> checkpoint write crashes for a 48 processor job,
>> with a segmentation fault. Checkpoint writes work on 32
>> processors, although very slowly. HDF5 file writes for
>> similar amounts of data work. (The cluster has Infiniband.)
>> We would like to run on a couple hundred processors.
>>
>> Someone suggested setting the environment variable MPICH_MPIIO_CB_ALIGN
>> to 0 or 1 for non-lustre file systems, but it doesn't seem have any effect.
>>
>>
>> I seem to recall that one of the original Cray XT-4 systems
>> also had a problem with extremely slow checkpoint writes and reads
>> before the Lustre file system was installed.
>> The code has run successfully on a number of different computers,
>> but I don't know what kind of file systems they had.
>>
>>
>> Any suggestions?
>> The local systems people don't know much about MPI.
>>
>>
>> Linda Sugiyama
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
More information about the mpich-discuss
mailing list