[mpich-discuss] What do these errors mean?

Gauri Kulkarni gaurivk at gmail.com
Thu Apr 2 00:30:45 CDT 2009


I cannot. Because I get the SAME errors. Following is the LSF script I used
to launch the job.

#!/bin/bash
#BSUB -L /bin/bash
#BSUB -n 8
#BSUB -N
#BSUB -o /data1/visitor/cgaurik/testmpi/helloworld.mympi.mpiexec.%J.out

cd /data1/visitor/cgaurik/testmpi
/data1/visitor/cgaurik/mympi/bin/mpiexec -np 8 ./helloworld.mympi

The job is NOT parallelized i.e. every process is rank 0. And errors are
same. Of course, if I change (as, I think, Dave pointed out) to srun
./helloworld.mympi in the last line of the script, everything is all rosy.
My question is, (maybe it's obvious...) that if my mpich2 is configured with
the options "--with-pmi=slurm --with-pm=no --with-slurm=/path/to/slurm/lib",
can I still use mpiexec?

Gauri.
---------


On Wed, Apr 1, 2009 at 11:42 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:

>  You need to use the mpicc and mpiexec from the MPICH2 installation that
> was built to use MPD.
>
> Rajeev
>
>
>  ------------------------------
> *From:* mpich-discuss-bounces at mcs.anl.gov [mailto:
> mpich-discuss-bounces at mcs.anl.gov] *On Behalf Of *Gauri Kulkarni
> *Sent:* Wednesday, April 01, 2009 8:56 AM
> *To:* mpich-discuss at mcs.anl.gov
> *Subject:* [mpich-discuss] What do these errors mean?
>
> Hi,
>
> I am using MPICH2-1.0.7 (I cannot go to 1.0.8 right now) which is
> configured to be used with SLURM. That is, the process manager is SLURM and
> NOT mpd. When I submit my job through bsub (bsub [options] srun
> ./helloworld.mympi), it works perfectly. I cannot use mpiexec as it is not
> the one spawning jobs, I must use srun. My question is, can I still use
> mpiexec from command-line? Well.. I tried. Here is the output:
>
> mpiexec -n 2 ./helloworld.mympi
> mpiexec_n53: cannot connect to local mpd (/tmp/mpd2.console_cgaurik);
> possible causes:
>   1. no mpd is running on this host
>   2. an mpd is running but was started without a "console" (-n option)
> In case 1, you can start an mpd on this host with:
>     mpd &
> and you will be able to run jobs just on this host.
> For more details on starting mpds on a set of hosts, see
> the MPICH2 Installation Guide.
>
> Then:
>
> mpd &
> mpiexec -n 2 ./helloworld.mympi
>
> *Hello world! I'm 0 of 2 on n53*
> Fatal error in MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(255)...................: MPI_Finalize failed
> MPI_Finalize(154)...................:
> MPID_Finalize(94)...................:
> MPI_Barrier(406)....................: MPI_Barrier(comm=0x44000002) failed
> MPIR_Barrier(77)....................:
> MPIC_Sendrecv(120)..................:
> MPID_Isend(103).....................: failure occurred while attempting to
> send an eager message
> MPIDI_CH3_iSend(172)................:
> MPIDI_CH3I_VC_post_sockconnect(1090):
> MPIDI_PG_SetConnInfo(615)...........: PMI_KVS_Get failedFatal error in
> MPI_Finalize: Other MPI error, error stack:
> MPI_Finalize(255)...................: MPI_Finalize failed
> MPI_Finalize(154)...................:
> MPID_Finalize(94)...................:
> MPI_Barrier(406)....................: MPI_Barrier(comm=0x44000002) failed
> MPIR_Barrier(77)....................:
> MPIC_Sendrecv(120)..................:
> MPID_Isend(103).....................: failure occurred while attempting to
> send an eager message
> MP*Hello world! I'm 1 of 2 on n53*
> IDI_CH3_iSend(172)................:
> MPIDI_CH3I_VC_post_sockconnect(1090):
> MPIDI_PG_SetConnInfo(615)...........: PMI_KVS_Get failed
>
> The bold text shows that the job gets executed but there is a lot of other
> garbage. It seems to me that I can either configure MPICH2 to be used with
> cluster job scheduler or to be used from command line. I cannot have both.
>
> Am I right?
>
> -Gauri.
> ----------
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090402/209febd2/attachment.htm>


More information about the mpich-discuss mailing list