[mpich-discuss] Finalize error
Rajeev Thakur
thakur at mcs.anl.gov
Fri Apr 11 12:45:13 CDT 2008
Which version of MPICH2 are you using? Can you try with the latest version,
1.0.7?
Rajeev
_____
From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Quentin Bossard
Sent: Friday, April 11, 2008 2:33 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] Finalize error
Hi everyone,
I am trying to run a program I wrote myself using mpi. The basic idea is to
dispatch tasks in the program on serveral cores/computers. It works fine
(i.e. the results of the tasks are correct and well collected). However I
have an error after the finalize (during...?). Anyway the "Exiting program"
is after the instruction finalize (and only done by the master).
I have not been able to find what was causing this error. The message is
below. Note that the error is not deterministic (i.e it does not happen all
the time...). If someone has any begining of idea I would be grateful to
hear it.
Another question : is there a friendly gpl (or at least free) mpi debugger ?
Thanks in advance for your help
Quentin
0 : Exiting program
Assertion failed in file ch3u_connect_sock.c at line 805: vcch->conn == conn
[cli_5]: aborting job:
internal ABORT - process 5
[cli_4]: aborting job:
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255).........................: MPI_Finalize failed
MPI_Finalize(154).........................:
MPID_Finalize(129)........................:
MPIDI_CH3U_VC_WaitForClose(339)...........: an error occurred while the
device was waiting for all open connections to close
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling
an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=4,errno=54:(strerror() not found))
Assertion failed in file ch3u_connect_sock.c at line 805: vcch->conn == conn
[cli_6]: aborting job:
internal ABORT - process 6
[cli_2]: aborting job:
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255).........................: MPI_Finalize failed
MPI_Finalize(154).........................:
MPID_Finalize(129)........................:
MPIDI_CH3U_VC_WaitForClose(339)...........: an error occurred while the
device was waiting for all open connections to close
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling
an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=4,errno=54:(strerror() not found))
[cli_3]: aborting job:
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255).........................: MPI_Finalize failed
MPI_Finalize(154).........................:
MPID_Finalize(129)........................:
MPIDI_CH3U_VC_WaitForClose(339)...........: an error occurred while the
device was waiting for all open connections to close
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling
an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(633)..............: connection failure
(set=0,sock=2,errno=54:(strerror() not found))
rank 5 in job 1741 hercules.arbitragis_64602 caused collective abort of
all ranks
exit status of rank 5: killed by signal 9
rank 4 in job 1741 hercules.arbitragis_64602 caused collective abort of
all ranks
exit status of rank 4: killed by signal 9
rank 3 in job 1741 hercules.arbitragis_64602 caused collective abort of
all ranks
exit status of rank 3: killed by signal 9
rank 2 in job 1741 hercules.arbitragis_64602 caused collective abort of
all ranks
exit status of rank 2: killed by signal 9
Exit 137
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080411/e0fcbf68/attachment.htm>
More information about the mpich-discuss
mailing list