[MPICH] Problem with MPICH2 mpiexec with different executables
Ralph Butler
rbutler at mtsu.edu
Thu May 4 22:41:53 CDT 2006
Hi Matthew:
If you run something like this:
mpiexec -n 1 master_mpi_pgm : -n 8 slave_mpi_pgm
you are saying that you have a 9 process MPI process job that happens
to run 2 different programs.
If you were to call MPI_Comm_size in this program, you would get 21.
When running MPI programs, mpd assumes that the entire set of of
processes are part of the computation.
So using mpd, it is not allowed to run a mixed bag of MPI and non-MPI
processes in a single job.
If you run this:
mpiexec -n 1 mpi_pgm : -n 1 non_mpi_pgm
The mpi_pgm will hang at MPI_Init trying to set up the communicator
for what it believes is a 2 process job.
--ralph
On May 4, 2006, at 8:55 PM, Matthew Siegel wrote:
> Hi all,
>
> Thanks for the help in advance. Here's a detailed explanantion of
> the problem I'm running in to.
>
> A little background . . . I am running the latest MPICH2. I am
> using Rocks, running on Xeon EM64T processors with Gig-E IP between
> the compute nodes. Not completely relevant, but want to be complete.
>
> I have written an app that has the following source code. It is a
> very very simple because my real program which is quite complicated
> was not working, and I figured that this app would work just fine.
>
> #include < mpi.h>
> #include <stdio.h>
>
> int main(int argc, char** argv) {
> MPI::Init(argc, argv);
> printf("My app is running!!!!\n");
> MPI::Finalize();
> return 0;
> }
>
> I compiled it like this:
> mpicxx -o my_app my_app.cpp
>
> I start an mpd daemon on the the head node 'mpd &', and verify with
> mpdtrace. So far so good.
>
> I then execute the following:
> mpiexec -l -n 1 ./myapp
>
> and I get:
> 0: My app is running!!!!
>
> and it quits. I then run:
> mpiexec -l -n 4 ./myapp
>
> and I get (as expected):
> 0: My app is running!!!!
> 1: My app is running!!!!
> 3: My app is running!!!!
> 2: My app is running!!!!
>
> OK, so this is good so far. Here's where things go awry...
>
> I then run:
> mpiexec -l -n 1 hostname : -n 1 date
>
> and I get (again as expected):
> 0: <hostname>
> 1: <date>
>
> Then I run (and here's where the problem is):
> mpiexec -l -n 1 ./myapp : -n 1 hostname
>
> and I get:
> 1: <hostname>
>
> And NOTHING else! Just hangs FOREVER . . . have to hit CTRL-C to
> quit.
>
> This continues regardless of the order that I run my app in,
> whether it's the first or second on the line, it does not matter.
> Also, this is true regardless of the "other" app that I am running,
> not necessarilly just 'hostname' or 'date', and does not matter how
> many nodes that I am using. The real apps that I am trying to run
> both are using MPI and am trying to pass data between the two
> processes. I have rebuilt MPICH2 multiple times, and tried various
> configure options with zero luck.
>
> Please help, this has become a showstopper for me.
>
> Thanks!
>
> Matt
More information about the mpich-discuss
mailing list