[mpich-discuss] [MPICH] network failure during the execution of parallel program

林智仁 seiferlin at gmail.com
Wed May 28 02:28:58 CDT 2008


Hi all:

I test a parallel program in a single machine with 4 processes.
The program only outputs ncpu and cpuid every 5 seconds
I use   mpiexec -localonly 4 myapp.exe
During the execution, I unplug the network line, and the program shows some
error messages like
generic socket failure.

If I use mpiexec -channel shm -n 4 myapp.exe, and also unplug the network
line, the same error messages are showed again.
After the network is unplugged, I run the program again, and it doesn't show
any error messages.

It seems that mpiexec will detect the network status at the runtime even the
shm channel is selected.

My question is that for -channel shm, it means shared memory, and any
network state changed shouldn't affect the program using
shared memory ?

I am really confused.

thanks,

Seifer Lin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080528/ace0f2fe/attachment.htm>


More information about the mpich-discuss mailing list