[mpich-discuss] mpich2-1.3 problems
Robert Graves
rwgraves at usgs.gov
Wed Oct 27 17:04:53 CDT 2010
Hello-
We have just installed mpich2-1.3 on a cluster of 18 nodes. The nodes are all running fedora 13
and consist of 64-bit HP machines of various vintages and numbers of cores (from 2 to 12 cores per node).
I have created a hostfile (named mpi.machinefile) with the following entries:
% cat mpi.machinefile
aki18:4
aki17:4
aki16:4
aki15:4
aki14:1
aki13:1
aki12:1
aki11:1
aki10:1
aki09:1
aki08:1
aki07:1
aki06:1
aki05:1
aki04:1
aki03:1
aki02:1
aki01:1
where my nodes are named aki01 ... aki18 (also resolved as aki01.urscorp.com ... aki18.urscorp.com).
Executing the following appears to work correctly:
% mpiexec -f mpi.machinefile -n 12 /opt/mpich2-1.3/examples/cpi
and gives the output:
Process 9 of 12 is on aki16.urscorp.com
Process 10 of 12 is on aki16.urscorp.com
Process 11 of 12 is on aki16.urscorp.com
Process 8 of 12 is on aki16.urscorp.com
Process 6 of 12 is on aki17.urscorp.com
Process 4 of 12 is on aki17.urscorp.com
Process 5 of 12 is on aki17.urscorp.com
Process 7 of 12 is on aki17.urscorp.com
Process 0 of 12 is on aki18.urscorp.com
Process 1 of 12 is on aki18.urscorp.com
Process 2 of 12 is on aki18.urscorp.com
Process 3 of 12 is on aki18.urscorp.com
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.004010
However, changing the requested number of CPUs to 17 causes a fatal error:
% mpiexec -f mpi.machinefile -n 17 /opt/mpich2-1.3/examples/cpi
and gives the output:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(385).................:
MPID_Init(135)........................: channel initialization failed
MPIDI_CH3_Init(38)....................:
MPID_nem_init(196)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(324)........:
MPIU_SHMW_Seg_open(863)...............:
MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
I also tried setting MPI_NO_LOCAL=1 but that did not help.
Any help you can provide is greatly appreciated.
Thanks,
Rob Graves
Research Geophysicst
US Geological Survey
Pasadena, CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20101027/52cfe350/attachment.htm>
More information about the mpich-discuss
mailing list