[mpich-discuss] assert (!closed) failed
Bernard Chambon
bernard.chambon at cc.in2p3.fr
Mon Jan 16 07:18:41 CST 2012
Hello,
I confirm a failure when specifying -iface + a high number of tasks.
I run hydra version 1.4.1p1 with a shared memory patch (seg_sz.patch)
Test , "by hand" (= not thru batch system) , between the two following machines:
>more /tmp/machines
ccwpge0061:128
ccwpge0062:128
1/ without specifying -iface, It's OK (more than 10 tries)
mpiexec -f /tmp/machines -n 150 bin/advance_test
bchambon at ccwpge0062's password:
I am there
Running MPI version 2, subversion 2
ref_message is ready
I am the master task 0 sur ccwpge0061, for 149 slaves tasks, we will exchange a buffer of 1 MB
slave number 1, iteration = 1
slave number 2, iteration = 1
slave number 3, iteration = 1
…
>echo $status
0
2/ When specifying -iface eth0 I always get a assertion failure
>mpiexec -iface eth0 -f /tmp/machines -n 150 bin/advance_test (as previous, more than 10 tries)
bchambon at ccwpge0062's password:
Segmentation fault
[mpiexec at ccwpge0061] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert (!closed) failed
[mpiexec at ccwpge0061] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at ccwpge0061] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at ccwpge0061] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
I'm quite sure that the failure occurs when increasing the number of tasks
with a machine file like :
ccwpge0061:8
ccwpge0062:8
>mpiexec -verbose -iface eth0 -f /tmp/machines -n 16 bin/advance_test
seems to be ok !
Best regards.
PS :
>limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize unlimited
coredumpsize unlimited
memoryuse unlimited
vmemoryuse unlimited
descriptors 1000000
memorylocked unlimited
maxproc 409600
---------------
Bernard CHAMBON
IN2P3 / CNRS
04 72 69 42 18
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120116/2f81e990/attachment.htm>
More information about the mpich-discuss
mailing list