[MPICH] Error handling issue

Jayesh Krishna jayesh at mcs.anl.gov
Mon Nov 12 10:29:52 CST 2007


Hi,
 This could probably be an error message given by the process manager.
 How are you aborting the process?
 
Regards,
Jayesh


  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of AGPX
Sent: Sunday, November 11, 2007 6:37 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] Error handling issue


Hi,

I have write the following code wishing to avoid my main process to abort on
an MPI error:

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &MPIId);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);

but when I try to terminate a job process on another machine (pcamd3000 is
the main machine, pcamd2600 the other. I use Windows XP Pro on both), then
the main process abort. Here the error message:

job aborted:
rank: node: exit code[: error message]
0: pcamd3000: 1: Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(173).............................: MPI_Send(buf=00B458B0, count=1,
MPI_
INT, dest=1, tag=0, comm=0x84000000) failed
MPIDI_CH3I_Progress(148)..................: handle_sock_op failed
MPIDI_CH3I_Progress_handle_sock_event(497):
MPIDU_Sock_wait(2603).....................: Il nome di rete specificato non
è più disponibile. (errno 64)
1: pcamd2600: 1: process 1 exited without calling finalize
2: pcamd2600: 1

(note that the message:  'Il nome di rete specificato non è più
disponibile.' in english is: 'The network name specified is no more
available'.)

What I miss? I have more than one communicator, but I have used
MPI_Comm_set_errhandler as well to set their error handler to
MPI_ERRORS_RETURN. The code is:

...
MPI_Group_incl(worldGroup, nRanks, ranks, &handle.group);
MPI_Comm_create(MPI_COMM_WORLD, handle.group, &handle.comm);
MPI_Comm_set_errhandler(handle.comm, MPI_ERRORS_RETURN);
...

I have also tried with MPI_Errhandler_set, but this doesn't help:

MPI_Errhandler_set(..., MPI_ERRORS_RETURN);

Any suggestion?

Thanks,

- AGPX




  _____  

  _____  

L'email della prossima generazione? Puoi averla con la nuova
<http://us.rd.yahoo.com/mail/it/taglines/hotmail/nowyoucan/nextgen/*http://i
t.docs.yahoo.com/nowyoucan.html> Yahoo! Mail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20071112/a3ecddd4/attachment.htm>


More information about the mpich-discuss mailing list