<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16608" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008>Hi,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008> These error messages are from the job
launcher/process manager which use sockets for communication.
</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008> You can modify the process manager code to use
pipes (or other IPC) instead of sockets (for communicating with local MPI
procs and job launcher) if you would like to be tolerant to network
failures (for *localonly* jobs).</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008>(PS: The idea of using "shm" as the channel is to
improve performance, not to get away from using sockets all
together.)</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008>Regards,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=944544014-29052008>Jayesh</SPAN></FONT></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Seifer Lin [mailto:seiferlin@gmail.com]
<BR><B>Sent:</B> Wednesday, May 28, 2008 8:35 PM<BR><B>To:</B> Jayesh
Krishna<BR><B>Cc:</B> mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> Re:
[mpich-discuss]network failure during the execution of parallel
program<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV>HI,</DIV>
<DIV> </DIV>
<DIV>I do the following test</DIV>
<DIV> </DIV>
<DIV>D:\test_mpi\release>mpiexec -channel shm -n 4 test_mpich2.exe<BR>iter=0,
cpuid=1, ncpu=4<BR>iter=0, cpuid=2, ncpu=4<BR>iter=0, cpuid=3, ncpu=4<BR>iter=0,
cpuid=0, ncpu=4<BR>iter=1, cpuid=2, ncpu=4<BR>iter=1, cpuid=1, ncpu=4<BR>iter=1,
cpuid=0, ncpu=4<BR>iter=1, cpuid=3, ncpu=4<BR>iter=2, cpuid=2, ncpu=4<BR>iter=2,
cpuid=3, ncpu=4<BR>iter=2, cpuid=1, ncpu=4<BR>iter=2, cpuid=0, ncpu=4<BR>op_read
error on left context: generic socket failure, error
stack:<BR>MPIDU_Sock_wait(2533): The specified network name is no longer
available. (errno<BR> 64)<BR>unable to read the cmd header on the left
context, generic socket failure, error<BR> stack:<BR>MPIDU_Sock_wait(2533):
The specified network name is no longer available. (errno<BR> 64).</DIV>
<DIV> </DIV>
<DIV>I unplug the network line while the iter=1 is displayed.</DIV>
<DIV> </DIV>
<DIV>thank tou very much</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV class=gmail_quote>2008/5/28 Jayesh Krishna <<A
href="mailto:jayesh@mcs.anl.gov">jayesh@mcs.anl.gov</A>>:<BR>
<BLOCKQUOTE class=gmail_quote
style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<DIV>
<P><FONT size=2> Hi,<BR> Specifying "shm" as the channel ensures
that all MPI communication (btw the MPI processes) is done using shared
memory. The error messages that you see could be from the process launcher or
the process manager.<BR> Do you really need to use the "-localonly"
option (Specifying the option you might end up seeing some error messages
which are handled within the library and does not effect the MPI job)? You can
run your job as "mpiexec -channel shm -n 4 myapp.exe". Let us know if you
still see the error messages (If yes, please copy-paste the error mesgs in
your email)<BR><BR>Regards,<BR><FONT color=#888888>Jayesh</FONT>
<DIV>
<DIV></DIV>
<DIV class=Wj3C7c><BR><BR>-----Original Message-----<BR>From: <A
href="mailto:owner-mpich-discuss@mcs.anl.gov"
target=_blank>owner-mpich-discuss@mcs.anl.gov</A> [<A
href="mailto:owner-mpich-discuss@mcs.anl.gov"
target=_blank>mailto:owner-mpich-discuss@mcs.anl.gov</A>] On Behalf Of Seifer
Lin<BR>Sent: Wednesday, May 28, 2008 2:32 AM<BR>To: <A
href="mailto:mpich-discuss@mcs.anl.gov"
target=_blank>mpich-discuss@mcs.anl.gov</A><BR>Subject: [mpich-discuss]network
failure during the execution of parallel program<BR><BR>Hi all:<BR><BR>I test
a parallel program in a single machine with 4 processes.<BR>The program only
outputs ncpu and cpuid every 5 seconds<BR>I use mpiexec -localonly
4 myapp.exe<BR>During the execution, I unplug the network line, and the
program shows some error messages like generic socket failure.<BR><BR>If I use
mpiexec -channel shm -n 4 myapp.exe, and also unplug the network line, the
same error messages are showed again.<BR>After the network is unplugged, I run
the program again, and it doesn't show any error messages.<BR><BR>It seems
that mpiexec will detect the network status at the runtime even the shm
channel is selected.<BR><BR>My question is that for -channel shm, it means
shared memory, and any network state changed shouldn't affect the program
using shared memory ?<BR><BR>I am really
confused.<BR><BR>thanks,<BR><BR>Seifer Lin<BR><BR><BR></DIV></DIV></FONT>
<P></P>
<P></P></DIV></BLOCKQUOTE></DIV><BR></BODY></HTML>