<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16640" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=007254417-11042008><FONT face=Arial
color=#0000ff size=2>Which version of MPICH2 are you using? Can you try with the
latest version, 1.0.7?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=007254417-11042008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=007254417-11042008><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=007254417-11042008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Quentin
Bossard<BR><B>Sent:</B> Friday, April 11, 2008 2:33 AM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> [mpich-discuss] Finalize
error<BR></FONT><BR></DIV>
<DIV></DIV>
<H4 style="FONT-WEIGHT: normal; FONT-FAMILY: arial,sans-serif">Hi
everyone,</H4><SPAN style="FONT-FAMILY: arial,sans-serif">I am trying to run a
program I wrote myself using mpi. The basic idea is to dispatch tasks in the
program on serveral cores/computers. It works fine (i.e. the results of the
tasks are correct and well collected). However I have an error after the
finalize (during...?). Anyway the "Exiting program" is after the instruction
finalize (and only done by the master).</SPAN><BR
style="FONT-FAMILY: arial,sans-serif"><SPAN
style="FONT-FAMILY: arial,sans-serif">I have not been able to find what was
causing this error. The message is below. Note that the error is not
deterministic (i.e it does not happen all the time...). If someone has any
begining of idea I would be grateful to hear it.<BR><BR>Another question : is
there a friendly gpl (or at least free) mpi debugger ?<BR><BR>Thanks in
advance for your help<BR><BR>Quentin<BR
style="FONT-FAMILY: verdana,sans-serif"></SPAN><BR
style="FONT-FAMILY: arial,sans-serif"><BR>
<H4 style="FONT-WEIGHT: normal; FONT-FAMILY: courier new,monospace">0 :
Exiting program<BR>Assertion failed in file ch3u_connect_sock.c at line 805:
vcch->conn == conn<BR>[cli_5]: aborting job:<BR>internal ABORT - process
5<BR>[cli_4]: aborting job:<BR>Fatal error in MPI_Finalize: Other MPI error,
error stack:<BR>MPI_Finalize(255).........................: MPI_Finalize
failed<BR>MPI_Finalize(154).........................:<BR>MPID_Finalize(129)........................:<BR>MPIDI_CH3U_VC_WaitForClose(339)...........:
an error occurred while the device was waiting for all open connections to
close<BR>MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(420):<BR>MPIDU_Socki_handle_read(633)..............:
connection failure (set=0,sock=4,errno=54:(strerror() not found))<BR>Assertion
failed in file ch3u_connect_sock.c at line 805: vcch->conn ==
conn<BR>[cli_6]: aborting job:<BR>internal ABORT - process 6<BR>[cli_2]:
aborting job:<BR>Fatal error in MPI_Finalize: Other MPI error, error
stack:<BR>MPI_Finalize(255).........................: MPI_Finalize
failed<BR>MPI_Finalize(154).........................:<BR>MPID_Finalize(129)........................:<BR>MPIDI_CH3U_VC_WaitForClose(339)...........:
an error occurred while the device was waiting for all open connections to
close<BR>MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(420):<BR>MPIDU_Socki_handle_read(633)..............:
connection failure (set=0,sock=4,errno=54:(strerror() not found))<BR>[cli_3]:
aborting job:<BR>Fatal error in MPI_Finalize: Other MPI error, error
stack:<BR>MPI_Finalize(255).........................: MPI_Finalize
failed<BR>MPI_Finalize(154).........................:<BR>MPID_Finalize(129)........................:<BR>MPIDI_CH3U_VC_WaitForClose(339)...........:
an error occurred while the device was waiting for all open connections to
close<BR>MPIDI_CH3i_Progress_wait(215).............: an error occurred while
handling an event returned by
MPIDU_Sock_Wait()<BR>MPIDI_CH3I_Progress_handle_sock_event(420):<BR>MPIDU_Socki_handle_read(633)..............:
connection failure (set=0,sock=2,errno=54:(strerror() not found))<BR>rank 5 in
job 1741 hercules.arbitragis_64602 caused collective abort
of all ranks<BR> exit status of rank 5: killed by signal 9<BR>rank 4 in
job 1741 hercules.arbitragis_64602 caused collective abort
of all ranks<BR> exit status of rank 4: killed by signal 9<BR>rank 3 in
job 1741 hercules.arbitragis_64602 caused collective abort
of all ranks<BR> exit status of rank 3: killed by signal 9<BR>rank 2 in
job 1741 hercules.arbitragis_64602 caused collective abort
of all ranks<BR> exit status of rank 2: killed by signal 9<BR>Exit
137<BR><BR></H4></BLOCKQUOTE></BODY></HTML>