[mpich-discuss] Problem with host file

Luca Ratto ratto at cfd-engineering.it
Tue Oct 26 08:42:32 CDT 2010


Dear Support,
I'm writing you since I've experienced a problem running a parallel code
through mpiexec.

We have 3 different computers on a LAN and we would like to run c++
codes over them. We installed the last version of MPICH 1.3 from source.
The OS is Ubuntu 9.04 for every node.
The nodes are so distributed:

"fl06" has 2 cores
"nextgen01" has 8 cores
"aiolos" has 4 cores


The code has been compiled with MPIC++.

I created a hostfile as the following:

fl06:1
nextgen01:2
aiolos:2

Then running the command

mpiexec -f hosts -n 4 ./a.out

the code works properly. On the contrary, if I edit the hostfile
changing loads over the nodes, as in the following

fl06:1
nextgen01:2
aiolos:1

I obtain this error message:


mpiexec -f hosts -n 4 ./eMBL.out
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(394).................: Initialization failed
MPID_Init(135)........................: channel initialization failed
MPIDI_CH3_Init(43)....................:
MPID_nem_init(202)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(358)........:
MPIU_SHMW_Seg_open(897)...............:
MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
directory
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(394).................: Initialization failed
MPID_Init(135)........................: channel initialization failed
MPIDI_CH3_Init(43)....................:
MPID_nem_init(202)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(358)........:
MPIU_SHMW_Seg_open(897)...............:
MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
directory
[proxy:0:0 at nextgen01] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN
& ~POLLOUT & ~POLLHUP)) failed
[proxy:0:0 at nextgen01] main (./pm/pmiserv/pmip.c:221): demux engine error
waiting for event
[mpiexec at fl06] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:99): one of the processes
terminated badly; aborting
[mpiexec at fl06] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error
waiting for completion
[mpiexec at fl06] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error
waiting for completion
[mpiexec at fl06] main (./ui/mpich/mpiexec.c:294): process manager error
waiting for completion



Could you please say me where I'm wrong?

Thanks a lot in advance.

Luca Ratto







More information about the mpich-discuss mailing list