[mpich-discuss]network failure during the execution of parallel program

Seifer Lin seiferlin at gmail.com
Wed May 28 02:32:13 CDT 2008


Hi all:

I test a parallel program in a single machine with 4 processes.
The program only outputs ncpu and cpuid every 5 seconds
I use   mpiexec -localonly 4 myapp.exe
During the execution, I unplug the network line, and the program shows
some error messages like
generic socket failure.

If I use mpiexec -channel shm -n 4 myapp.exe, and also unplug the
network line, the same error messages are showed again.
After the network is unplugged, I run the program again, and it
doesn't show any error messages.

It seems that mpiexec will detect the network status at the runtime
even the shm channel is selected.

My question is that for -channel shm, it means shared memory, and any
network state changed shouldn't affect the program using
shared memory ?

I am really confused.

thanks,

Seifer Lin




More information about the mpich-discuss mailing list