[mpich-discuss] assert (!closed) failed

Bernard Chambon bernard.chambon at cc.in2p3.fr
Mon Jan 16 07:18:41 CST 2012


Hello,

I confirm a failure when specifying -iface + a high number of tasks.
I run hydra version 1.4.1p1 with a shared memory patch (seg_sz.patch)

Test , "by hand" (= not thru batch system) , between the two following machines:
>more /tmp/machines 
ccwpge0061:128
ccwpge0062:128

1/ without specifying -iface, It's OK (more than 10 tries)

mpiexec -f /tmp/machines -n 150 bin/advance_test
bchambon at ccwpge0062's password: 

I am there 
Running MPI version 2, subversion 2 
ref_message is ready 
I am the master task 0 sur ccwpge0061, for 149 slaves tasks, we will exchange a buffer of 1 MB

slave number 1, iteration = 1
slave number 2, iteration = 1
slave number 3, iteration = 1
…

>echo $status
0

2/ When specifying -iface eth0  I always get a assertion failure

>mpiexec -iface eth0 -f /tmp/machines -n 150 bin/advance_test  (as previous, more than 10 tries)
bchambon at ccwpge0062's password: 

Segmentation fault
[mpiexec at ccwpge0061] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
[mpiexec at ccwpge0061] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at ccwpge0061] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at ccwpge0061] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion


I'm quite sure that the failure occurs when increasing the number of tasks
with a machine file like : 
ccwpge0061:8
ccwpge0062:8

 >mpiexec -verbose -iface eth0 -f /tmp/machines -n 16 bin/advance_test

seems to be ok !

Best regards.

PS : 

 >limit
cputime      unlimited
filesize     unlimited
datasize     unlimited
stacksize    unlimited
coredumpsize unlimited
memoryuse    unlimited
vmemoryuse   unlimited
descriptors  1000000 
memorylocked unlimited
maxproc      409600 


---------------
Bernard CHAMBON
IN2P3 / CNRS
04 72 69 42 18

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120116/2f81e990/attachment.htm>


More information about the mpich-discuss mailing list