[mpich-discuss] Problem with host file
Luca Ratto
ratto at cfd-engineering.it
Tue Oct 26 08:42:32 CDT 2010
Dear Support,
I'm writing you since I've experienced a problem running a parallel code
through mpiexec.
We have 3 different computers on a LAN and we would like to run c++
codes over them. We installed the last version of MPICH 1.3 from source.
The OS is Ubuntu 9.04 for every node.
The nodes are so distributed:
"fl06" has 2 cores
"nextgen01" has 8 cores
"aiolos" has 4 cores
The code has been compiled with MPIC++.
I created a hostfile as the following:
fl06:1
nextgen01:2
aiolos:2
Then running the command
mpiexec -f hosts -n 4 ./a.out
the code works properly. On the contrary, if I edit the hostfile
changing loads over the nodes, as in the following
fl06:1
nextgen01:2
aiolos:1
I obtain this error message:
mpiexec -f hosts -n 4 ./eMBL.out
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(394).................: Initialization failed
MPID_Init(135)........................: channel initialization failed
MPIDI_CH3_Init(43)....................:
MPID_nem_init(202)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(358)........:
MPIU_SHMW_Seg_open(897)...............:
MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
directory
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(394).................: Initialization failed
MPID_Init(135)........................: channel initialization failed
MPIDI_CH3_Init(43)....................:
MPID_nem_init(202)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(358)........:
MPIU_SHMW_Seg_open(897)...............:
MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
directory
[proxy:0:0 at nextgen01] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents & ~POLLIN
& ~POLLOUT & ~POLLHUP)) failed
[proxy:0:0 at nextgen01] main (./pm/pmiserv/pmip.c:221): demux engine error
waiting for event
[mpiexec at fl06] HYDT_bscu_wait_for_completion
(./tools/bootstrap/utils/bscu_wait.c:99): one of the processes
terminated badly; aborting
[mpiexec at fl06] HYDT_bsci_wait_for_completion
(./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error
waiting for completion
[mpiexec at fl06] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error
waiting for completion
[mpiexec at fl06] main (./ui/mpich/mpiexec.c:294): process manager error
waiting for completion
Could you please say me where I'm wrong?
Thanks a lot in advance.
Luca Ratto
More information about the mpich-discuss
mailing list