[mpich-discuss] What do these errors mean?

Gauri Kulkarni gaurivk at gmail.com
Wed Apr 1 08:56:12 CDT 2009


Hi,

I am using MPICH2-1.0.7 (I cannot go to 1.0.8 right now) which is configured
to be used with SLURM. That is, the process manager is SLURM and NOT mpd.
When I submit my job through bsub (bsub [options] srun ./helloworld.mympi),
it works perfectly. I cannot use mpiexec as it is not the one spawning jobs,
I must use srun. My question is, can I still use mpiexec from command-line?
Well.. I tried. Here is the output:

mpiexec -n 2 ./helloworld.mympi
mpiexec_n53: cannot connect to local mpd (/tmp/mpd2.console_cgaurik);
possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
In case 1, you can start an mpd on this host with:
    mpd &
and you will be able to run jobs just on this host.
For more details on starting mpds on a set of hosts, see
the MPICH2 Installation Guide.

Then:

mpd &
mpiexec -n 2 ./helloworld.mympi

*Hello world! I'm 0 of 2 on n53*
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255)...................: MPI_Finalize failed
MPI_Finalize(154)...................:
MPID_Finalize(94)...................:
MPI_Barrier(406)....................: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77)....................:
MPIC_Sendrecv(120)..................:
MPID_Isend(103).....................: failure occurred while attempting to
send an eager message
MPIDI_CH3_iSend(172)................:
MPIDI_CH3I_VC_post_sockconnect(1090):
MPIDI_PG_SetConnInfo(615)...........: PMI_KVS_Get failedFatal error in
MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(255)...................: MPI_Finalize failed
MPI_Finalize(154)...................:
MPID_Finalize(94)...................:
MPI_Barrier(406)....................: MPI_Barrier(comm=0x44000002) failed
MPIR_Barrier(77)....................:
MPIC_Sendrecv(120)..................:
MPID_Isend(103).....................: failure occurred while attempting to
send an eager message
MP*Hello world! I'm 1 of 2 on n53*
IDI_CH3_iSend(172)................:
MPIDI_CH3I_VC_post_sockconnect(1090):
MPIDI_PG_SetConnInfo(615)...........: PMI_KVS_Get failed

The bold text shows that the job gets executed but there is a lot of other
garbage. It seems to me that I can either configure MPICH2 to be used with
cluster job scheduler or to be used from command line. I cannot have both.

Am I right?

-Gauri.
----------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090401/c95f8f8a/attachment.htm>


More information about the mpich-discuss mailing list