[mpich-discuss] MPI-IO of multidimensional arrays - issue reported by valgrind

Ashley Pittman ashley at pittman.co.uk
Tue Feb 23 08:20:51 CST 2010


On 23 Feb 2010, at 14:09, Rob Latham wrote:

> On Tue, Feb 23, 2010 at 11:54:24AM +0000, Turlough Downes wrote:
>> To do this I'm using MPI_Type_create_subarray(), MPI_File_open(),
>> MPI_File_set_view() and MPI_File_write_all().  My problem is that my
>> code is occasionally hanging on output on a Blue Gene/P system.  
> 
> What file system are you using on this BlueGene system?  If you can
> get your program to dump core, you can print a backtrace.  That
> backtrace will help us understand if this is a true hang, or if
> processes are just progressing very slowly.

An alternative would be to use padb to attach to it and report a backtrace whilst it's hung.

>> ==10315== Conditional jump or move depends on uninitialised value(s)
>> ==10315==    at 0x4C86DE8: (within /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315==    by 0x4C88F0E: ADIOI_GEN_WriteStridedColl (in /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315==    by 0x4D678C6: MPIOI_File_write_all (in /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315==    by 0x4D679B8: PMPI_File_write_all (in /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315==    by 0x41B6B4: hydra_write_single_binary_file (out_binary_single_file.c:101)
> 
> We should probably be more careful with the 'conditional jump or move'
> warnings, but they rarely if ever point to an actual defect. 


That's a bold statement, it's true that they often appear to have no effect but if you are seeing an error in combination with a hang then I'd certainly look at it closely.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



More information about the mpich-discuss mailing list