[mpich-discuss] halt after mpiexec

Gao, Yi gaoyi.cn at gmail.com
Thu Jan 14 00:00:06 CST 2010


Dear all,

I'm new here and encounter a problem at the very beginning of learning mpi.

Basically, I get
mpiexec -n i /bin/hostname
works for any i >= 1 I've tested.

but
mpiexec -n i /path-to-example-dir/cpi
error for any i >= 2

The details are:

I have 3 machines, all running Ubuntu 9.10 with gcc/g++ 4.4.1
one has two cores, and the other two have one core for each.
(machine name: rome, 2 core;
                        julia, 1 core;
                        meg, 1 core )


On this minimal testing bed for me to learn mpi, I built using
mpich2-1.2.1 using the default configure in "installation guide"

Then on "rome", I put the mpd.hosts file in home dir with content:
julia
meg

Then I ran
mpdboot -n 3  # works
mpdtrace -l # works, show the three machine names and port num
mpiexec -l -n 3 /bin/hostname # works! show three machine names

but

mpiexec -l -n 3 /tmp/gth818n/mpich2-1.2.1/example/cpi # !!!!!!!! it
halted there.

Then I tried:
mpiexec -l -n 1 /tmp/gth818n/mpich2-1.2.1/example/cpi # works, run on
rome only and returns the result

But -n larger or equal than 2 causes it to halt, or getting such
errors (with -n 4):
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(394).................: Initialization failed
MPID_Init(135)........................: channel initialization failed
MPIDI_CH3_Init(43)....................:
MPID_nem_init(202)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(358)........:
MPIU_SHMW_Seg_open(897)...............:
MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or directory
rank 3 in job 12  rome_39209   caused collective abort of all ranks
  exit status of rank 3: return code 1


Then, I rebuild mpich2 on rome (coz it's SMP), with --with-device=ch3:ssm

But got same error.

Could any one gives me some directions to go?

Thanks in advance!

Best,
yi


More information about the mpich-discuss mailing list