[mpich-discuss] Problem with host file

Pavan Balaji balaji at mcs.anl.gov
Tue Oct 26 13:19:44 CDT 2010


Hi Luca,

Thanks for the report. It looks like there is a minor bug in the PMI-1.1 
support within Hydra. I'll be committing a patch for this soon.

In the meanwhile, can you use the following as a workaround:

% mpiexec -genv PMI_SUBVERSION=0 -f hosts -n 4 ./a.out

  -- Pavan

On 10/26/2010 08:42 AM, Luca Ratto wrote:
> Dear Support,
> I'm writing you since I've experienced a problem running a parallel code
> through mpiexec.
>
> We have 3 different computers on a LAN and we would like to run c++
> codes over them. We installed the last version of MPICH 1.3 from source.
> The OS is Ubuntu 9.04 for every node.
> The nodes are so distributed:
>
> "fl06" has 2 cores
> "nextgen01" has 8 cores
> "aiolos" has 4 cores
>
>
> The code has been compiled with MPIC++.
>
> I created a hostfile as the following:
>
> fl06:1
> nextgen01:2
> aiolos:2
>
> Then running the command
>
> mpiexec -f hosts -n 4 ./a.out
>
> the code works properly. On the contrary, if I edit the hostfile
> changing loads over the nodes, as in the following
>
> fl06:1
> nextgen01:2
> aiolos:1
>
> I obtain this error message:
>
>
> mpiexec -f hosts -n 4 ./eMBL.out
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(394).................: Initialization failed
> MPID_Init(135)........................: channel initialization failed
> MPIDI_CH3_Init(43)....................:
> MPID_nem_init(202)....................:
> MPIDI_CH3I_Seg_commit(366)............:
> MPIU_SHMW_Hnd_deserialize(358)........:
> MPIU_SHMW_Seg_open(897)...............:
> MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
> directory
> Fatal error in MPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(394).................: Initialization failed
> MPID_Init(135)........................: channel initialization failed
> MPIDI_CH3_Init(43)....................:
> MPID_nem_init(202)....................:
> MPIDI_CH3I_Seg_commit(366)............:
> MPIU_SHMW_Hnd_deserialize(358)........:
> MPIU_SHMW_Seg_open(897)...............:
> MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
> directory
> [proxy:0:0 at nextgen01] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents&  ~POLLIN
> &  ~POLLOUT&  ~POLLHUP)) failed
> [proxy:0:0 at nextgen01] main (./pm/pmiserv/pmip.c:221): demux engine error
> waiting for event
> [mpiexec at fl06] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:99): one of the processes
> terminated badly; aborting
> [mpiexec at fl06] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error
> waiting for completion
> [mpiexec at fl06] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error
> waiting for completion
> [mpiexec at fl06] main (./ui/mpich/mpiexec.c:294): process manager error
> waiting for completion
>
>
>
> Could you please say me where I'm wrong?
>
> Thanks a lot in advance.
>
> Luca Ratto
>
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list