[ROMIO Req #897] [MPICH] MPI_File_read_all hanging

Robert Latham robl at mcs.anl.gov
Tue Feb 5 14:28:41 CST 2008


On Fri, Feb 01, 2008 at 11:57:09PM -0600, Wei-keng Liao wrote:
> 
> I have an I/O program hanging on MPI_File_read_all. The code is the 
> attached C file. It writes 20 3D block-block-block partitioned arrays, 
> closes the file, re-opens it, and reads the 20 arrays back, also in the 
> same 3D block pattern. It is similar to the ROMIO 3D test code, 
> coll_test.c
> 
> The error occured when I ran on 64 processes, not less (the machine I ran 
> has 2 processors per node). The first 20 writes are OK. But the program 
> hangs at around 10th read. After tracing down to the source, it hangs on
>            MPI_Waitall(nprocs_recv, requests, statuses);
> in function ADIOI_R_Exchange_data(), file ad_read_coll.c .
> 
> I am using mpich2-1.0.6p1 on a Linux cluster 
> 2.6.9-42.0.10.EL_lustre-1.4.10.1smp #1 SMP x86_64 x86_64 x86_64 GNU/Linux

I'm looking at the diff between 1.0.5p2 and 1.0.6p1.  Here's what's
changed that touches the MPI_File_read_all and MPI_File_write_all
code path:

- convert two mallocs into one larger malloc in ADIOI_Calc_others_req,
  wait for offsets and lens in a single waitall

- another reworking for the john bent test case (romio req #835).  

That's basically it.  We also added a lot of MPE logging, which does
clutter up the diff a bit.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B




More information about the mpich-discuss mailing list