<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:courier,monaco,monospace,sans-serif;font-size:12pt"><div style="font-family: courier,monaco,monospace,sans-serif; font-size: 12pt;"><span style="font-family: times new roman,new york,times,serif;">Hi,<br><br>I abort the process by killing the process (from Task Manager). Basically, my application (on the so called 'main machine', ID = 0) distribute his calculation on various machines (called the 'evaluators'). When an evaluator abort for any reason (could be also a blackout) I need to handle this situation in order to delegate his calculation to another evaluator, so that I can avoid to lost calculations already done by the other evaluators. Actually, when an evaluator abort (I have tried to kill the process), the main process (ID = 0) abort with the error message described and this is a serious problem for me.<br></span><br style="font-family: times
new roman,new york,times,serif;"><span style="font-family: times new roman,new york,times,serif;">Best regards,</span><br style="font-family: times new roman,new york,times,serif;"><br style="font-family: times new roman,new york,times,serif;"><span style="font-family: times new roman,new york,times,serif;">Gianluca Arcidiacono</span><br><br><br><div style="font-family: times new roman,new york,times,serif; font-size: 12pt;">----- Messaggio originale -----<br>Da: Jayesh Krishna <jayesh@mcs.anl.gov><br>A: AGPX <agpxnet@yahoo.it><br>Cc: mpich-discuss@mcs.anl.gov<br>Inviato: Lunedì 12 novembre 2007, 17:29:52<br>Oggetto: RE: [MPICH] Error handling issue<br><br>
<style type="text/css">DIV {
MARGIN:0px;}
</style>
<div dir="ltr" align="left"><font color="#0000ff" face="Arial" size="2"><span class="830242716-12112007">Hi,</span></font></div>
<div dir="ltr" align="left"><font color="#0000ff" face="Arial" size="2"><span class="830242716-12112007"> This could probably be an error message given by
the process manager.</span></font></div>
<div><font color="#0000ff" face="Arial" size="2"><span class="830242716-12112007"> How are you aborting the
process?</span></font></div>
<div><font color="#0000ff" face="Arial" size="2"><span class="830242716-12112007"></span></font> </div>
<div><span class="830242716-12112007"><font color="#0000ff" face="Arial" size="2">Regards,</font></span></div>
<div><span class="830242716-12112007"><font color="#0000ff" face="Arial" size="2">Jayesh</font></span></div><font size="2"></font><br>
<div class="OutlookMessageHeader" dir="ltr" align="left" lang="en-us">
<hr tabindex="-1">
<font face="Tahoma" size="2"><b>From:</b> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <b>On Behalf Of
</b>AGPX<br><b>Sent:</b> Sunday, November 11, 2007 6:37 AM<br><b>To:</b>
mpich-discuss@mcs.anl.gov<br><b>Subject:</b> [MPICH] Error handling
issue<br></font><br></div>
<div></div>
<div style="font-size: 12pt; font-family: times new roman,new york,times,serif;">
<div>Hi,<br><br>I have write the following code wishing to avoid my main process
to abort on an MPI error:<br><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Init(&argc,
&argv);</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Comm_rank(MPI_COMM_WORLD,
&MPIId);</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Comm_size(MPI_COMM_WORLD,
&numprocs);</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Comm_set_errhandler(MPI_COMM_WORLD,
<span style="font-weight: bold;">MPI_ERRORS_RETURN</span>);</span><br><br>but
when I try to terminate a job process on another machine (pcamd3000 is the main
machine, pcamd2600 the other. I use Windows XP Pro on both), then the main
process abort. Here the error message:<br><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">job
aborted:</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">rank:
node: exit code[: error message]</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">0:
pcamd3000: 1: Fatal error in MPI_Send: Other MPI error, error stack:</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Send(173).............................:
MPI_Send(buf=00B458B0, count=1, MPI_</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">INT,
dest=1, tag=0, comm=0x84000000) failed</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPIDI_CH3I_Progress(148)..................:
handle_sock_op failed</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPIDI_CH3I_Progress_handle_sock_event(497):</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPIDU_Sock_wait(2603).....................:
Il nome di rete specificato non è più</span><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">
disponibile. (errno 64)</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">1:
pcamd2600: 1: process 1 exited without calling finalize</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">2:
pcamd2600: 1</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><br>(note
that the message: '<span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">Il
nome di rete specificato non è più</span><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">
disponibile.' </span>in english is: 'The network name specified is no more
available'.)<br><br>What I miss? I have more than one communicator, but I have
used MPI_Comm_set_errhandler as well to set their error handler to
MPI_ERRORS_RETURN. The code is:<br><br><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">...</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Group_incl(worldGroup,
nRanks, ranks, &handle.group);</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Comm_create(MPI_COMM_WORLD,
handle.group, &handle.comm);</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Comm_set_errhandler(handle.comm,
<span style="font-weight: bold;">MPI_ERRORS_RETURN</span>);</span><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">...</span><br><br>I
have also tried with MPI_Errhandler_set, but this doesn't help:<br><br style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;"><span style="color: rgb(0, 0, 255); font-family: courier,monaco,monospace,sans-serif;">MPI_Errhandler_set(...,
MPI_ERRORS_RETURN);</span><br><br>Any suggestion?<br><br>Thanks,<br><br>-
AGPX<br><br><br></div></div><br>
<hr size="1">
<font face="Arial" size="2">
<hr size="1">
<font face="Arial" size="2">L'email della prossima generazione? Puoi averla con la
<a rel="nofollow" target="_blank" href="http://us.rd.yahoo.com/mail/it/taglines/hotmail/nowyoucan/nextgen/*http://it.docs.yahoo.com/nowyoucan.html">nuova
Yahoo! Mail</a></font></font></div><br></div></div><br>
<hr size=1><font face="Arial" size="2"><hr size=1><font face="Arial" size="2">L'email della prossima generazione? Puoi averla con la <a href="http://us.rd.yahoo.com/mail/it/taglines/hotmail/nowyoucan/nextgen/*http://it.docs.yahoo.com/nowyoucan.html">nuova Yahoo! Mail</a></font></body></html>