[mpich-discuss] MPI-IO of multidimensional arrays - issue reported by valgrind
Ashley Pittman
ashley at pittman.co.uk
Tue Feb 23 08:20:51 CST 2010
On 23 Feb 2010, at 14:09, Rob Latham wrote:
> On Tue, Feb 23, 2010 at 11:54:24AM +0000, Turlough Downes wrote:
>> To do this I'm using MPI_Type_create_subarray(), MPI_File_open(),
>> MPI_File_set_view() and MPI_File_write_all(). My problem is that my
>> code is occasionally hanging on output on a Blue Gene/P system.
>
> What file system are you using on this BlueGene system? If you can
> get your program to dump core, you can print a backtrace. That
> backtrace will help us understand if this is a true hang, or if
> processes are just progressing very slowly.
An alternative would be to use padb to attach to it and report a backtrace whilst it's hung.
>> ==10315== Conditional jump or move depends on uninitialised value(s)
>> ==10315== at 0x4C86DE8: (within /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315== by 0x4C88F0E: ADIOI_GEN_WriteStridedColl (in /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315== by 0x4D678C6: MPIOI_File_write_all (in /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315== by 0x4D679B8: PMPI_File_write_all (in /usr/lib64/mpich2/lib/libmpich.so.1.2)
>> ==10315== by 0x41B6B4: hydra_write_single_binary_file (out_binary_single_file.c:101)
>
> We should probably be more careful with the 'conditional jump or move'
> warnings, but they rarely if ever point to an actual defect.
That's a bold statement, it's true that they often appear to have no effect but if you are seeing an error in combination with a hang then I'd certainly look at it closely.
Ashley,
--
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
More information about the mpich-discuss
mailing list