[MPICH] failure before MPI_Init and other newbie questions on Windows XP

Lists Hammersley hammersleylists at googlemail.com
Thu Sep 13 10:42:19 CDT 2007


I'm new to programming with MPI, and I'm having a couple of problems
which I would much appreciate some help with.

Platform: Windows 64 XP Version 2003 SP 1, x2 core and
multi-processor, mpiexec running on a single machine.
Build: binary distribution of 1.0.5p2 and build from source of 1.0.5p4
Similar problems are seen on a Windows 32 XP machine.

The vast majority of the MPI code is non-blocking sends and blocking receives.

- Occasionally MPICH2 fails (about 1 in 7 runs) at/or before the call
to MPI_Init on one of the processes. There is minimal work performed
before MPI_Init is reached.

[01:3128].....ERROR:result command received but the wait_list is empty.
[01:3128]...ERROR:unable to handle the command: "cmd=result src=0
dest=1 tag=7 cmd_tag=0 ctx_key=2 result=SUCCESS "
[01:3128].ERROR:error closing the unknown context socket: generic
socket failure, error stack:
MPIDU_Sock_wait(2589): The I/O operation has been aborted because of
either a thread exit or an application request. (errno 995)
[01:3128]..ERROR:sock_op_close returned while unknown context is in
state: unknown state -17891602

- MPICH2 fails occasionally in MPI_Alltoall in what looks like a
similar manner to MPI_Init. It only seems to be on the first call.

[01:4044].....ERROR:result command received but the wait_list is empty.
[01:4044]...ERROR:unable to handle the command: "cmd=result src=1
dest=1 tag=23 cmd_tag=5 cmd_orig=dbget ctx_key=2 value="port=4214
description=roxaroxfrph.roxardomain.roxar.com ifname=193.1.1.47 "
result=DBS_SUCCESS "
[01:4044].ERROR:error closing the unknown context socket: generic
socket failure, error stack:
MPIDU_Sock_wait(2589): The I/O operation has been aborted because of
either a thread exit or an application request. (errno 995)
[01:4044]..ERROR:sock_op_close returned while unknown context is in
state: unknown state -17891602

- As a double-check on code validity after each MPI_Barrier an
MPI_IProbe is done to see if there are still some buffers to receive.
Invariably, there are a few buffers to receive. Assuming there are no
bugs in my code and assuming all the buffers have been received on the
local process before MPI_Barrier is reached. Is this possible?

- At MPI_Finalize what happens if  a non-blocking send has not been
completed on the processor? Is it prudent to put a MPI_Barrier before
MPI_Finalize?

Thank you for your time and interest.

Cheers,
Richard




More information about the mpich-discuss mailing list