<div dir="ltr"><div>Compilation performed by my boss.<div>Configuration:</div><div><p class="MsoNormal"><span style="font-size: 10pt; font-family: Arial, sans-serif; ">./configure --with-device=ch3:sock --enable-<span class="J-JK9eJ-PJVNOc" style="background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: yellow; background-position: initial initial; background-repeat: initial initial; ">debuginfo</span> --prefix=/space/local/mvapich2 <span class="J-JK9eJ-PJVNOc" style="background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: yellow; background-position: initial initial; background-repeat: initial initial; ">CFLAGS</span>=-<span class="J-JK9eJ-PJVNOc" style="background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: yellow; background-position: initial initial; background-repeat: initial initial; ">fPIC</span> --enable-shared --enable-threads --enable-<span class="J-JK9eJ-PJVNOc" style="background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: yellow; background-position: initial initial; background-repeat: initial initial; ">sharedlibs</span>=<span class="J-JK9eJ-PJVNOc" style="background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: yellow; background-position: initial initial; background-repeat: initial initial; ">gcc</span> --with-pm=<span class="J-JK9eJ-PJVNOc" style="background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: yellow; background-position: initial initial; background-repeat: initial initial; ">mpd</span>:hydra</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Arial, sans-serif; ">mvapich2-1.7rc2</span></p></div></div><div><br></div>Anatoly.<br><br><div class="gmail_quote">On Tue, Oct 25, 2011 at 4:17 PM, Darius Buntinas <span dir="ltr">&lt;<a href="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br>
Did you configure and compile MPICH2 yourself?  If you did, please send us the command you used to configure it (e.g., ./configure --prefix=...).<br>
<br>
If you didn&#39;t compile it yourself, you&#39;ll need to talk to the person who did to get that information.<br>
<br>
Also, what version of MPICH2 are you using?<br>
<font color="#888888"><br>
-d<br>
</font><div><div></div><div class="h5"><br>
On Oct 25, 2011, at 2:30 AM, Anatoly G wrote:<br>
<br>
&gt; Initilization lines are:<br>
&gt; MPI::Init(argc, argv);<br>
&gt; MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);<br>
&gt;<br>
&gt; Execution command:<br>
&gt; mpiexec.hydra -disable-auto-cleanup -launcher rsh -launcher-exec /usr/bin/rsh -f machines.txt -n 11 mpi_send_sync<br>
&gt;<br>
&gt; Anatoly.<br>
&gt;<br>
&gt;<br>
&gt; On Mon, Oct 24, 2011 at 10:17 PM, Darius Buntinas &lt;<a href="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</a>&gt; wrote:<br>
&gt;<br>
&gt; In MPI_Init, the signal handler should be installed, so SIGUSR1 shouldn&#39;t kill the process.<br>
&gt;<br>
&gt; Can you send us the configure line you used?<br>
&gt;<br>
&gt; -d<br>
&gt;<br>
&gt; On Oct 23, 2011, at 1:54 AM, Anatoly G wrote:<br>
&gt;<br>
&gt; &gt; Sorry, I&quot;m still don&#39;t understand.<br>
&gt; &gt; When remote process fails, rest of processes get SIGUSR1, and by default are failed, because they don&#39;t have any signal handler.<br>
&gt; &gt; If I&quot;ll create signal handler for SIGUSR1, I can&#39;t detect that one of remote/local processes dead. How can I recognize which remote process dead. Signal has only local host process information.<br>
&gt; &gt;<br>
&gt; &gt; Anatoly.<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; On Mon, Oct 17, 2011 at 7:40 PM, Darius Buntinas &lt;<a href="mailto:buntinas@mcs.anl.gov">buntinas@mcs.anl.gov</a>&gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; On Oct 15, 2011, at 4:47 AM, Pavan Balaji wrote:<br>
&gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; On 10/11/2011 02:35 PM, Darius Buntinas wrote:<br>
&gt; &gt; &gt;&gt; I took a look at your code.  Mpiexec will send a SIGUSR1 signal to<br>
&gt; &gt; &gt;&gt; each process to notify it of a failed process (Oops, I forgot about<br>
&gt; &gt; &gt;&gt; that when I responded to your previous email).  If you need a signal<br>
&gt; &gt; &gt;&gt; for your application, you&#39;ll need to choose another one.  The signal<br>
&gt; &gt; &gt;&gt; handler you installed replaced MPICH&#39;s signal handler, so the library<br>
&gt; &gt; &gt;&gt; wasn&#39;t able to detect that the process had failed.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Anatoly: In stacked libraries, you are supposed to chain signal handlers. Replacing another library&#39;s signal handlers can lead to unexpected behavior.<br>
&gt; &gt;<br>
&gt; &gt; If you set the signal handler before calling MPI_Init, MPICH will chain your signal handler.<br>
&gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;&gt; Another problem is that MPI_Abort() isn&#39;t killing all processes, so<br>
&gt; &gt; &gt;&gt; when I commented out CreateOwnSignalHandler(), the master detected<br>
&gt; &gt; &gt;&gt; the failure and called MPI_Abort(), but some slave processes were<br>
&gt; &gt; &gt;&gt; still hanging in MPI_Barrier().  We&#39;ll need to fix that.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Darius: What&#39;s the expected behavior here? Should a regular exit look at whether the user asked for a cleanup or not, and an abort kill all processes?<br>
&gt; &gt;<br>
&gt; &gt; That&#39;s what I think it should do.  MPI_Abort should kill all processes in the specified communicator.  If you can&#39;t kill only the processes in the communicator, then it should kill all connected processes (i.e., the job, plus any dynamic procs).<br>

&gt; &gt;<br>
&gt; &gt; -d<br>
&gt; &gt;<br>
&gt; &gt; &gt; -- Pavan<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; --<br>
&gt; &gt; &gt; Pavan Balaji<br>
&gt; &gt; &gt; <a href="http://www.mcs.anl.gov/~balaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>
&gt; &gt;<br>
&gt; &gt; _______________________________________________<br>
&gt; &gt; mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
&gt; &gt; To manage subscription options or unsubscribe:<br>
&gt; &gt; <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
&gt; &gt;<br>
&gt; &gt; _______________________________________________<br>
&gt; &gt; mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
&gt; &gt; To manage subscription options or unsubscribe:<br>
&gt; &gt; <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
&gt; To manage subscription options or unsubscribe:<br>
&gt; <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
&gt; To manage subscription options or unsubscribe:<br>
&gt; <a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
<br>
_______________________________________________<br>
mpich-discuss mailing list     <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</div></div></blockquote></div><br></div>