[MPICH] failure before MPI_Init and other newbie questions on Windows XP

Jayesh Krishna jayesh at mcs.anl.gov
Thu Sep 13 11:17:22 CDT 2007


 Hi,
  Can you send us your MPI program and let us know how to recreate the
problem (any particular setup required...) ?

Regards,
Jayesh

-----Original Message-----
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Lists Hammersley
Sent: Thursday, September 13, 2007 10:42 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] failure before MPI_Init and other newbie questions on
Windows XP

I'm new to programming with MPI, and I'm having a couple of problems which I
would much appreciate some help with.

Platform: Windows 64 XP Version 2003 SP 1, x2 core and multi-processor,
mpiexec running on a single machine.
Build: binary distribution of 1.0.5p2 and build from source of 1.0.5p4
Similar problems are seen on a Windows 32 XP machine.

The vast majority of the MPI code is non-blocking sends and blocking
receives.

- Occasionally MPICH2 fails (about 1 in 7 runs) at/or before the call to
MPI_Init on one of the processes. There is minimal work performed before
MPI_Init is reached.

[01:3128].....ERROR:result command received but the wait_list is empty.
[01:3128]...ERROR:unable to handle the command: "cmd=result src=0
dest=1 tag=7 cmd_tag=0 ctx_key=2 result=SUCCESS "
[01:3128].ERROR:error closing the unknown context socket: generic socket
failure, error stack:
MPIDU_Sock_wait(2589): The I/O operation has been aborted because of either
a thread exit or an application request. (errno 995)
[01:3128]..ERROR:sock_op_close returned while unknown context is in
state: unknown state -17891602

- MPICH2 fails occasionally in MPI_Alltoall in what looks like a similar
manner to MPI_Init. It only seems to be on the first call.

[01:4044].....ERROR:result command received but the wait_list is empty.
[01:4044]...ERROR:unable to handle the command: "cmd=result src=1
dest=1 tag=23 cmd_tag=5 cmd_orig=dbget ctx_key=2 value="port=4214
description=roxaroxfrph.roxardomain.roxar.com ifname=193.1.1.47 "
result=DBS_SUCCESS "
[01:4044].ERROR:error closing the unknown context socket: generic socket
failure, error stack:
MPIDU_Sock_wait(2589): The I/O operation has been aborted because of either
a thread exit or an application request. (errno 995)
[01:4044]..ERROR:sock_op_close returned while unknown context is in
state: unknown state -17891602

- As a double-check on code validity after each MPI_Barrier an MPI_IProbe is
done to see if there are still some buffers to receive.
Invariably, there are a few buffers to receive. Assuming there are no bugs
in my code and assuming all the buffers have been received on the local
process before MPI_Barrier is reached. Is this possible?

- At MPI_Finalize what happens if  a non-blocking send has not been
completed on the processor? Is it prudent to put a MPI_Barrier before
MPI_Finalize?

Thank you for your time and interest.

Cheers,
Richard





More information about the mpich-discuss mailing list