[ROMIO Req #897] [MPICH] MPI_File_read_all hanging
Robert Latham
robl at mcs.anl.gov
Tue Feb 5 14:28:41 CST 2008
On Fri, Feb 01, 2008 at 11:57:09PM -0600, Wei-keng Liao wrote:
>
> I have an I/O program hanging on MPI_File_read_all. The code is the
> attached C file. It writes 20 3D block-block-block partitioned arrays,
> closes the file, re-opens it, and reads the 20 arrays back, also in the
> same 3D block pattern. It is similar to the ROMIO 3D test code,
> coll_test.c
>
> The error occured when I ran on 64 processes, not less (the machine I ran
> has 2 processors per node). The first 20 writes are OK. But the program
> hangs at around 10th read. After tracing down to the source, it hangs on
> MPI_Waitall(nprocs_recv, requests, statuses);
> in function ADIOI_R_Exchange_data(), file ad_read_coll.c .
>
> I am using mpich2-1.0.6p1 on a Linux cluster
> 2.6.9-42.0.10.EL_lustre-1.4.10.1smp #1 SMP x86_64 x86_64 x86_64 GNU/Linux
I'm looking at the diff between 1.0.5p2 and 1.0.6p1. Here's what's
changed that touches the MPI_File_read_all and MPI_File_write_all
code path:
- convert two mallocs into one larger malloc in ADIOI_Calc_others_req,
wait for offsets and lens in a single waitall
- another reworking for the john bent test case (romio req #835).
That's basically it. We also added a lot of MPE logging, which does
clutter up the diff a bit.
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
More information about the mpich-discuss
mailing list