[mpich-discuss]network failure during the execution of parallel program

Seifer Lin seiferlin at gmail.com
Wed May 28 20:34:45 CDT 2008


HI,

I do the following test

D:\test_mpi\release>mpiexec -channel shm -n 4 test_mpich2.exe
iter=0, cpuid=1, ncpu=4
iter=0, cpuid=2, ncpu=4
iter=0, cpuid=3, ncpu=4
iter=0, cpuid=0, ncpu=4
iter=1, cpuid=2, ncpu=4
iter=1, cpuid=1, ncpu=4
iter=1, cpuid=0, ncpu=4
iter=1, cpuid=3, ncpu=4
iter=2, cpuid=2, ncpu=4
iter=2, cpuid=3, ncpu=4
iter=2, cpuid=1, ncpu=4
iter=2, cpuid=0, ncpu=4
op_read error on left context: generic socket failure, error stack:
MPIDU_Sock_wait(2533): The specified network name is no longer available.
(errno
 64)
unable to read the cmd header on the left context, generic socket failure,
error
 stack:
MPIDU_Sock_wait(2533): The specified network name is no longer available.
(errno
 64).

I unplug the network line while the iter=1 is displayed.

thank tou very much



2008/5/28 Jayesh Krishna <jayesh at mcs.anl.gov>:

>   Hi,
>   Specifying "shm" as the channel ensures that all MPI communication (btw
> the MPI processes) is done using shared memory. The error messages that you
> see could be from the process launcher or the process manager.
>   Do you really need to use the "-localonly" option (Specifying the option
> you might end up seeing some error messages which are handled within the
> library and does not effect the MPI job)? You can run your job as "mpiexec
> -channel shm -n 4 myapp.exe". Let us know if you still see the error
> messages (If yes, please copy-paste the error mesgs in your email)
>
> Regards,
> Jayesh
>
>
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov [
> mailto:owner-mpich-discuss at mcs.anl.gov <owner-mpich-discuss at mcs.anl.gov>]
> On Behalf Of Seifer Lin
> Sent: Wednesday, May 28, 2008 2:32 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss]network failure during the execution of parallel
> program
>
> Hi all:
>
> I test a parallel program in a single machine with 4 processes.
> The program only outputs ncpu and cpuid every 5 seconds
> I use   mpiexec -localonly 4 myapp.exe
> During the execution, I unplug the network line, and the program shows some
> error messages like generic socket failure.
>
> If I use mpiexec -channel shm -n 4 myapp.exe, and also unplug the network
> line, the same error messages are showed again.
> After the network is unplugged, I run the program again, and it doesn't
> show any error messages.
>
> It seems that mpiexec will detect the network status at the runtime even
> the shm channel is selected.
>
> My question is that for -channel shm, it means shared memory, and any
> network state changed shouldn't affect the program using shared memory ?
>
> I am really confused.
>
> thanks,
>
> Seifer Lin
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080529/1076c929/attachment.htm>


More information about the mpich-discuss mailing list