[MPICH2-dev] problem with multithreaded version of sock mpich2-1.06 under Windows

Ryzhykh, Alexey alexey.ryzhykh at intel.com
Mon Sep 24 08:41:31 CDT 2007


Hi everybody,

We faced the problems with using multithreaded version of mpich2-1.06
built with ch3: sock channel under Windows IA32.

May be the same problems exists on other Intel platforms under Windows -
Intel 64 and IA64. I was able to build working Win mpich2-1.06 only for
IA32.

The simple MT tests like mpich2/threaded_sr works fine but under stress
testing we see the problems.  

I used the special version of IMB that support running several threads
for our stress testing.

And I got intermittent failures running 8 processes on 4 nodes with 8
threads.

Sometimes the test finishes successfully, sometimes it hangs and
sometimes the following error appears:

 

 

job aborted:

rank: node: exit code[: error message]

0: svsmpiw03: 1

1: svsmpiw03: 1: Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=03BC0040,
count=262144

 MPI_BYTE, src=2, tag=MPI_ANY_TAG, comm=0x84000007, status=01B4FE78)
failed

MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an

event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(420):

MPIDU_Sock_wait(2602).....................: The specified network name
is no lo

ger available. (errno 64)

2: svsmpiw04: 1: Fatal error in MPI_Waitall: Other MPI error, error
stack:

MPI_Waitall(258)..........................: MPI_Waitall(count=2,
req_array=01B4

E68, status_array=01B4FE78) failed

MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an

event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(420):

MPIDU_Sock_wait(2464).....................: Unable to re-post an aborted
readv

peration

MPIDU_Sock_post_readv(1655)...............: An existing connection was
forcibly

closed by the remote host. (errno 10054)

3: svsmpiw04: 1: Fatal error in MPI_Recv: Other MPI error, error stack:

MPI_Recv(186).............................: MPI_Recv(buf=03BC0040,
count=262144

 MPI_BYTE, src=2, tag=MPI_ANY_TAG, comm=0x84000005, status=01B4FE78)
failed

MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an

event returned by MPIDU_Sock_Wait()

MPIDI_CH3I_Progress_handle_sock_event(420):

MPIDU_Sock_wait(2602).....................: The specified network name
is no lo

ger available. (errno 64)

 

 

Could you please help to solve the problem?

I can provide you with this MT IMB in separate email.  I can't reproduce
the problem on small test cases.

 

With best regards,

Alexey Ryzhykh,

---

Intel, Sarov

 

 

--------------------------------------------------------------------
Closed Joint Stock Company Intel A/O
Registered legal address: 125252, Moscow, Russian Federation, 
Chapayevsky Per, 14.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20070924/491cb775/attachment.htm>


More information about the mpich2-dev mailing list