[mpich-discuss] assert (!closed) failed

Pavan Balaji balaji at mcs.anl.gov
Fri Mar 16 16:30:02 CDT 2012


Bernard,

Can you give us a test program that shows this?  I'm not able to 
reproduce this problem.

  -- Pavan

On 01/16/2012 07:18 AM, Bernard Chambon wrote:
> Hello,
>
> I confirm a failure when specifying -iface + a high number of tasks.
> I run hydra version 1.4.1p1 with a shared memory patch (seg_sz.patch)
>
> Test , "by hand" (= not thru batch system) , between the two following
> machines:
> />more /tmp/machines /
> /ccwpge0061:128/
> /ccwpge0062:128/
>
> 1/ without specifying -iface, It's OK (more than 10 tries)
>
> mpiexec -f /tmp/machines -n 150 bin/advance_test
> bchambon at ccwpge0062's password:
>
> I am there
> Running MPI version 2, subversion 2
> ref_message is ready
> I am the master task 0 sur ccwpge0061, for 149 slaves tasks, we will
> exchange a buffer of 1 MB
>
> slave number 1, iteration = 1
> slave number 2, iteration = 1
> slave number 3, iteration = 1
>>
>  >echo $status
> 0
>
> 2/ When specifying -iface eth0 I _always_ get a assertion failure
>
>  >mpiexec -iface eth0 -f /tmp/machines -n 150 bin/advance_test (as
> previous, more than 10 tries)
> bchambon at ccwpge0062's password:
>
> Segmentation fault
> [mpiexec at ccwpge0061] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert
> (!closed) failed
> [mpiexec at ccwpge0061] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at ccwpge0061] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
> [mpiexec at ccwpge0061] main (./ui/mpich/mpiexec.c:405): process manager
> error waiting for completion
>
>
> I'm quite sure that the failure occurs when increasing the number of tasks
> with a machine file like :
> ccwpge0061:8
> ccwpge0062:8
>
>>mpiexec -verbose -iface eth0 -f /tmp/machines -n 16 bin/advance_test
>
> seems to be ok !
>
> Best regards.
>
> PS :
>
>>limit
> cputime unlimited
> filesize unlimited
> datasize unlimited
> stacksize unlimited
> coredumpsize unlimited
> memoryuse unlimited
> vmemoryuse unlimited
> descriptors 1000000
> memorylocked unlimited
> maxproc 409600
>
>
> ---------------
> Bernard CHAMBON
> IN2P3 / CNRS
> 04 72 69 42 18
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list