[mpich-discuss] Unable to run program parallely on cluster... Its running properly on single machine...

Darius Buntinas buntinas at mcs.anl.gov
Mon May 21 14:36:44 CDT 2012


It may be that one of your processes is failing, but also check to make sure every process is calling MPI_Finalize before exiting.

-d

On May 22, 2012, at 2:42 AM, Albert Spade wrote:

> Hi everybody,
>  
> I am using mpich2-1.4.1p1 and mpiexec from hydra-1.5b1
> I have a cluster of 5 machines.
> When I am trying to run the program for parallel fast fourier transform on single machine it runs correctly but on a cluster it gives error.
> Can you please tell me why its happening.
>  
> Thanks.
>  
> Here is my sample output:
> ---------------------------------------------------------------------------------------
>  
> [root at beowulf programs]# mpiexec -n 1 ./Radix2
> Time taken for 16 elements using 1 processors = 2.7895e-05 seconds
> [root at beowulf programs]#
> [root at beowulf programs]# mpiexec -n 4 ./Radix2
> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197): assert (!closed) failed
> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process manager error waiting for completion
> [root at beowulf programs]# mpiexec -n 2 ./Radix2
> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197): assert (!closed) failed
> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process manager error waiting for completion
> [root at beowulf programs]# mpiexec -n 4 ./Radix2
> [mpiexec at beowulf.master] control_cb (./pm/pmiserv/pmiserv_cb.c:197): assert (!closed) failed
> [mpiexec at beowulf.master] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at beowulf.master] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:205): error waiting for event
> [mpiexec at beowulf.master] main (./ui/mpich/mpiexec.c:437): process manager error waiting for completion
> [root at beowulf programs]#
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list