[mpich-discuss] Problem with host file
Pavan Balaji
balaji at mcs.anl.gov
Tue Oct 26 13:38:01 CDT 2010
FYI, this has been fixed in the trunk:
https://trac.mcs.anl.gov/projects/mpich2/changeset/7376
You can apply this patch to your working copy, or wait for the nightly
snapshot.
-- Pavan
On 10/26/2010 01:19 PM, Pavan Balaji wrote:
> Hi Luca,
>
> Thanks for the report. It looks like there is a minor bug in the PMI-1.1
> support within Hydra. I'll be committing a patch for this soon.
>
> In the meanwhile, can you use the following as a workaround:
>
> % mpiexec -genv PMI_SUBVERSION=0 -f hosts -n 4 ./a.out
>
> -- Pavan
>
> On 10/26/2010 08:42 AM, Luca Ratto wrote:
>> Dear Support,
>> I'm writing you since I've experienced a problem running a parallel code
>> through mpiexec.
>>
>> We have 3 different computers on a LAN and we would like to run c++
>> codes over them. We installed the last version of MPICH 1.3 from source.
>> The OS is Ubuntu 9.04 for every node.
>> The nodes are so distributed:
>>
>> "fl06" has 2 cores
>> "nextgen01" has 8 cores
>> "aiolos" has 4 cores
>>
>>
>> The code has been compiled with MPIC++.
>>
>> I created a hostfile as the following:
>>
>> fl06:1
>> nextgen01:2
>> aiolos:2
>>
>> Then running the command
>>
>> mpiexec -f hosts -n 4 ./a.out
>>
>> the code works properly. On the contrary, if I edit the hostfile
>> changing loads over the nodes, as in the following
>>
>> fl06:1
>> nextgen01:2
>> aiolos:1
>>
>> I obtain this error message:
>>
>>
>> mpiexec -f hosts -n 4 ./eMBL.out
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(394).................: Initialization failed
>> MPID_Init(135)........................: channel initialization failed
>> MPIDI_CH3_Init(43)....................:
>> MPID_nem_init(202)....................:
>> MPIDI_CH3I_Seg_commit(366)............:
>> MPIU_SHMW_Hnd_deserialize(358)........:
>> MPIU_SHMW_Seg_open(897)...............:
>> MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
>> directory
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(394).................: Initialization failed
>> MPID_Init(135)........................: channel initialization failed
>> MPIDI_CH3_Init(43)....................:
>> MPID_nem_init(202)....................:
>> MPIDI_CH3I_Seg_commit(366)............:
>> MPIU_SHMW_Hnd_deserialize(358)........:
>> MPIU_SHMW_Seg_open(897)...............:
>> MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or
>> directory
>> [proxy:0:0 at nextgen01] HYDT_dmxu_poll_wait_for_event
>> (./tools/demux/demux_poll.c:70): assert (!(pollfds[i].revents& ~POLLIN
>> & ~POLLOUT& ~POLLHUP)) failed
>> [proxy:0:0 at nextgen01] main (./pm/pmiserv/pmip.c:221): demux engine error
>> waiting for event
>> [mpiexec at fl06] HYDT_bscu_wait_for_completion
>> (./tools/bootstrap/utils/bscu_wait.c:99): one of the processes
>> terminated badly; aborting
>> [mpiexec at fl06] HYDT_bsci_wait_for_completion
>> (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error
>> waiting for completion
>> [mpiexec at fl06] HYD_pmci_wait_for_completion
>> (./pm/pmiserv/pmiserv_pmci.c:352): bootstrap server returned error
>> waiting for completion
>> [mpiexec at fl06] main (./ui/mpich/mpiexec.c:294): process manager error
>> waiting for completion
>>
>>
>>
>> Could you please say me where I'm wrong?
>>
>> Thanks a lot in advance.
>>
>> Luca Ratto
>>
>>
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list