[mpich-discuss] Finalize error
Quentin Bossard
quentin.bossard at gmail.com
Mon Apr 14 02:33:46 CDT 2008
Hi,
Thank you for your answer. I am currently using the 1.0.6 version. I
currently cannot try with the 1.0.7 version. Is this a known bug from 1.0.6
which disappeared in 1.0.7 ?
Best regards,
Quentin
On Fri, Apr 11, 2008 at 7:45 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> Which version of MPICH2 are you using? Can you try with the latest
> version, 1.0.7?
>
> Rajeev
>
>
> ------------------------------
> *From:* owner-mpich-discuss at mcs.anl.gov [mailto:
> owner-mpich-discuss at mcs.anl.gov] *On Behalf Of *Quentin Bossard
> *Sent:* Friday, April 11, 2008 2:33 AM
> *To:* mpich-discuss at mcs.anl.gov
> *Subject:* [mpich-discuss] Finalize error
>
> Hi everyone,I am trying to run a program I wrote myself using mpi. The
> basic idea is to dispatch tasks in the program on serveral cores/computers.
> It works fine (i.e. the results of the tasks are correct and well
> collected). However I have an error after the finalize (during...?). Anyway
> the "Exiting program" is after the instruction finalize (and only done by
> the master).
> I have not been able to find what was causing this error. The message is
> below. Note that the error is not deterministic (i.e it does not happen all
> the time...). If someone has any begining of idea I would be grateful to
> hear it.
>
> Another question : is there a friendly gpl (or at least free) mpi debugger
> ?
>
> Thanks in advance for your help
>
> Quentin
>
>
> 0 : Exiting program
> Assertion failed in file ch3u_connect_sock.c at line 805: vcch->conn ==
> conn
> [cli_5]: aborting job:
> internal ABORT - process 5
> [cli_4]: aborting job:
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(255).........................: MPI_Finalize failed
> MPI_Finalize(154).........................:
> MPID_Finalize(129)........................:
> MPIDI_CH3U_VC_WaitForClose(339)...........: an error occurred while the
> device was waiting for all open connections to close
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(633)..............: connection failure
> (set=0,sock=4,errno=54:(strerror() not found))
> Assertion failed in file ch3u_connect_sock.c at line 805: vcch->conn ==
> conn
> [cli_6]: aborting job:
> internal ABORT - process 6
> [cli_2]: aborting job:
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(255).........................: MPI_Finalize failed
> MPI_Finalize(154).........................:
> MPID_Finalize(129)........................:
> MPIDI_CH3U_VC_WaitForClose(339)...........: an error occurred while the
> device was waiting for all open connections to close
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(633)..............: connection failure
> (set=0,sock=4,errno=54:(strerror() not found))
> [cli_3]: aborting job:
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(255).........................: MPI_Finalize failed
> MPI_Finalize(154).........................:
> MPID_Finalize(129)........................:
> MPIDI_CH3U_VC_WaitForClose(339)...........: an error occurred while the
> device was waiting for all open connections to close
> MPIDI_CH3i_Progress_wait(215).............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(633)..............: connection failure
> (set=0,sock=2,errno=54:(strerror() not found))
> rank 5 in job 1741 hercules.arbitragis_64602 caused collective abort of
> all ranks
> exit status of rank 5: killed by signal 9
> rank 4 in job 1741 hercules.arbitragis_64602 caused collective abort of
> all ranks
> exit status of rank 4: killed by signal 9
> rank 3 in job 1741 hercules.arbitragis_64602 caused collective abort of
> all ranks
> exit status of rank 3: killed by signal 9
> rank 2 in job 1741 hercules.arbitragis_64602 caused collective abort of
> all ranks
> exit status of rank 2: killed by signal 9
> Exit 137
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080414/90c4d350/attachment.htm>
More information about the mpich-discuss
mailing list