[MPICH] Problem with MPICH2 mpiexec with different executables

Ralph Butler rbutler at mtsu.edu
Thu May 4 22:44:11 CDT 2006


> Hi Matthew:
>
> If you run something like this:
>     mpiexec -n 1 master_mpi_pgm : -n 8 slave_mpi_pgm
> you are saying that you have a 9 process MPI process job that  
> happens to run 2 different programs.
> If you were to call MPI_Comm_size in this program, you would get 21.

     Correction: you would get 9

> When running MPI programs, mpd assumes that the entire set of of  
> processes are part of the computation.
> So using mpd, it is not allowed to run a mixed bag of MPI and non- 
> MPI processes in a single job.
> If you run this:
>     mpiexec -n 1 mpi_pgm : -n 1 non_mpi_pgm
> The mpi_pgm will hang at MPI_Init trying to set up the communicator  
> for what it believes is a 2 process job.
>
> --ralph
>
> On May 4, 2006, at 8:55 PM, Matthew Siegel wrote:
>
>> Hi all,
>>
>> Thanks for the help in advance.  Here's a detailed explanantion of  
>> the problem I'm running in to.
>>
>> A little background . . . I am running the latest MPICH2.  I am  
>> using Rocks, running on Xeon EM64T processors with Gig-E IP  
>> between the compute nodes.  Not completely relevant, but want to  
>> be complete.
>>
>> I have written an app that has the following source code.  It is a  
>> very very simple because my real program which is quite  
>> complicated was not working, and I figured that this app would  
>> work just fine.
>>
>> #include < mpi.h>
>> #include <stdio.h>
>>
>> int main(int argc, char** argv) {
>>     MPI::Init(argc, argv);
>>     printf("My app is running!!!!\n");
>>     MPI::Finalize();
>>     return 0;
>> }
>>
>> I compiled it like this:
>>     mpicxx -o my_app my_app.cpp
>>
>> I start an mpd daemon on the the head node 'mpd &', and verify  
>> with mpdtrace.  So far so good.
>>
>> I then execute the following:
>>     mpiexec -l -n 1 ./myapp
>>
>> and I get:
>>     0: My app is running!!!!
>>
>> and it quits.  I then run:
>>     mpiexec -l -n 4 ./myapp
>>
>> and I get (as expected):
>>     0: My app is running!!!!
>>     1: My app is running!!!!
>>     3: My app is running!!!!
>>     2: My app is running!!!!
>>
>> OK, so this is good so far.   Here's where things go awry...
>>
>> I then run:
>>     mpiexec -l -n 1 hostname : -n 1 date
>>
>> and I get (again as expected):
>>     0: <hostname>
>>     1: <date>
>>
>> Then I run (and here's where the problem is):
>>     mpiexec -l -n 1 ./myapp : -n 1 hostname
>>
>> and I get:
>>     1: <hostname>
>>
>> And NOTHING else!  Just hangs FOREVER . . . have to hit CTRL-C to  
>> quit.
>>
>> This continues regardless of the order that I run my app in,  
>> whether it's the first or second on the line, it does not matter.   
>> Also, this is true regardless of the "other" app that I am  
>> running, not necessarilly just 'hostname' or 'date', and does not  
>> matter how many nodes that I am using.  The real apps that I am  
>> trying to run both are using MPI and am trying to pass data  
>> between the two processes.  I have rebuilt MPICH2 multiple times,  
>> and tried various configure options with zero luck.
>>
>> Please help, this has become a showstopper for me.
>>
>> Thanks!
>>
>> Matt
>




More information about the mpich-discuss mailing list