[MPICH] FreeBSD and the ch3:smm channel?
Rajeev Thakur
thakur at mcs.anl.gov
Tue Jan 30 18:02:18 CST 2007
Can you try the ch3:nemesis channel? That will also do shared memory within
a node and TCP across nodes.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Steve Kargl
> Sent: Tuesday, January 30, 2007 5:17 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] FreeBSD and the ch3:smm channel?
>
> I have a 6 node cluster with each node containing 2 dual-core
> opterons. The OS is FreeBSD 6.2-stable. Thus, I have the
> cluster of SMP systems configuration where the docs suggests
> that ch3:smm may be an appropriate device.
>
> First, I have to apply the attached patch to get MPICH2
> to build. Once built and installed. "make testing" yield
> numerous failures of the form (long lines wrapped):
>
> node10:kargl[374] make testing
> (cd test && make testing)
> (NOXMLCLOSE=YES && export NOXMLCLOSE && cd mpi && make testing)
> ./runtests -srcdir=. -tests=testlist
> -mpiexec=/usr/local/bin/mpiexec \
> -xmlfile=summary.xml
> Looking in ./testlist
> Processing directory attr
> Looking in ./attr/testlist
> Unexpected output in attrt: [cli_0]: aborting job:
> Unexpected output in attrt: Fatal error in MPI_Init: Other
> MPI error, \
> error stack:
> Unexpected output in attrt: MPIR_Init_thread(247)..................:
> Initialization failed
> Unexpected output in attrt: MPID_Init(82)..........................:
> channel initialization failed
> Unexpected output in attrt: MPIDI_CH3_Init(108)....................:
> Unexpected output in attrt: MPIDI_CH3U_Init_sshm(241)..............:
> unable to create a bootstrap message queue
> Unexpected output in attrt: MPIDI_CH3I_BootstrapQ_create_named(341):
> failed to create a shared memory message queue
> Unexpected output in attrt: MPIDI_CH3I_mqshm_create(97)............:
> Out of memory
> Unexpected output in attrt: MPIDI_CH3I_SHM_Get_mem_named(573)......:
> unable to open shared memory object
> /mpich2q2729273E73AA241D14EB89E545BFD0CA (errno 13)
> Unexpected output in attrt: rank 0 in job 34 node10.cimu.org_53882
> caused collective abort of all ranks
> Unexpected output in attrt: exit status of rank 0: return code 1
> Program attrt exited without No Errors
>
> Is there some further tuning that is needed? Checking the docs
> doesn't reveal anything (at least the ones I've checked didn't).
>
> Other testing shows
> node10:kargl[375] mpdtrace -l
> node10.cimu.org_53882 (192.168.0.10)
> node14.cimu.org_64173 (192.168.0.14)
> node13.cimu.org_60277 (192.168.0.13)
> node12.cimu.org_51621 (192.168.0.12)
> node11.cimu.org_54128 (192.168.0.11)
> node15.cimu.org_61948 (192.168.0.15)
> node10:kargl[376] mpdringtest 24
> time for 24 loops = 2.30105090141 seconds
>
> --
> Steve
>
More information about the mpich-discuss
mailing list