[mpich-discuss] MPI_Comm_spawn_multiple() fails

Jayesh Krishna jayesh at mcs.anl.gov
Tue Apr 6 14:48:08 CDT 2010


Hi,
 Are you setting any MPICH specific environment before running your program (eg: MPICH_INTERFACE_HOSTNAME)?
 How are you compiling your code (Visual Studio / Cygwin gcc ...) ?

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Tuesday, April 6, 2010 2:19:13 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,

I do realize that it is strange. It is even stranger that is works
without setting "path" key on MPI_Info but I am getting the same result
on two computers. Anyway, the log files are attached.

Thank you.

Alex 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 
Sent: Tuesday, April 06, 2010 11:29 AM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 Hmmm... This is strange. I just compiled and ran your code (mpiexec -n
1 -channel mt mpi_spawn.exe) on my XP machine and it ran without a hang
(printed the hello mesgs and exited normally).
 Did you recompile your code after installing a newer version of MPICH2
? Try re-compiling your code and see if it works.
 If that does not work, please do the following,

1) Stop any instances of the process manager, smpd.exe, running in your
system (smpd -stop)
2) Start smpd in the debug mode and redirect output to a log file (smpd
-d > smpd.log)
3) Launch your job by running mpiexec in the verbose mode and redirect
output to a log file (mpiexec -verbose -n 1 -channel mt mpi_spawn.exe >
mpiexec.log)
   If the application hangs wait for sometime before killing (Ctrl-C)
the mpiexec command - to give time for stdout to be written to the log
file.
4) Provide us with the log files
5) You can stop the debug session of smpd by Ctrl-C and start smpd using
the "smpd -start" command.

 Let us know the results.

Regards,
Jayesh
 
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>, mpich-discuss at mcs.anl.gov
Sent: Monday, April 5, 2010 2:42:25 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,

With "-channel mt" switch it behaves exactly as with direct call to
mpi_spawn.exe: it hangs - three instances (one calling and two spawned)
mpi_spawn.exe seen in the Task Manager.

Alex 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: Monday, April 05, 2010 12:30 PM
To: mpich-discuss at mcs.anl.gov
Cc: Bakkal, Alex
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 Since your program requires threaded support try using the
multithreaded channel (mpiexec -n 1 -channel mt mpi_spawn.exe).

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: mpich-discuss at mcs.anl.gov
Sent: Monday, April 5, 2010 12:27:00 PM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails



Nope. Screen output: 






-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov [
mailto:mpich-discuss-bounces at mcs.anl.gov ] On Behalf Of Jayesh Krishna
Sent: Monday, April 05, 2010 10:17 AM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi,
Does it work if you launch it using mpiexec (mpiexec -n 1
mpi_spawn.exe)? 

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Monday, April 5, 2010 12:02:13 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi, 

The mpi_spawn.cpp is attached. I am calling my app directly: 

D:\Projects\mpi_spawn\Release>mpi_spawn.exe 

Thank you for your help. 

Alex 

-----Original Message-----
From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ]
Sent: Monday, April 05, 2010 7:16 AM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi, 
Can you send us the modified test program (attach it in your email) ? 
Also let us know how you ran the test (the complete mpiexec command). 

Regards, 
Jayesh 
----- Original Message ----- 
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com> 
To: "Jayesh Krishna" <jayesh at mcs.anl.gov> 
Cc: mpich-discuss at mcs.anl.gov 
Sent: Friday, April 2, 2010 7:22:21 PM GMT -06:00 US/Canada Central 
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

I am very sorry for misleading you in my previous response. The app does

NOT run with 
char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\mpi_spawn" }; 

It hangs but at least it spawned two instances of mpi_spawn.exe 

-----Original Message----- 
From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ] 
Sent: Friday, April 02, 2010 3:20 PM 
To: Bakkal, Alex 
Cc: mpich-discuss at mcs.anl.gov 
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 


Does, 

char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\mpi_spawn" }; 

work for you ? 

Regards, 
Jayesh 
----- Original Message ----- 
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com> 
To: "Jayesh Krishna" <jayesh at mcs.anl.gov> 
Cc: mpich-discuss at mcs.anl.gov 
Sent: Friday, April 2, 2010 4:04:11 PM GMT -06:00 US/Canada Central 
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Yes, spawning one instance (if you mean that by single program) works
fine but spawning two instances of the same program (even located in two
separate folders) hangs. 

char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\tm\\mpi_spawn" }; 

Thank you. 

Alex 

-----Original Message----- 
From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ] 
Sent: Friday, April 02, 2010 1:59 PM 
To: Bakkal, Alex 
Cc: mpich-discuss at mcs.anl.gov 
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi, 
I tried spawn() with the host info set to "localhost" and it worked fine
for me (However I spawned using a single program. Does that work for you
?). 

Regards, 
Jayesh 
----- Original Message ----- 
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com> 
To: jayesh at mcs.anl.gov, mpich-discuss at mcs.anl.gov 
Sent: Friday, April 2, 2010 3:24:21 PM GMT -06:00 US/Canada Central 
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi Jayesh, 

Thank you for response. My test app, as well as the examples that you
have referred me to, works fine locally until I attempt to set "host" 
key in MPI_Info. Unfortunately, I need that key in order to spawn
remotely. That is my ultimate goal. 

Thank you anyway. 

Alex 

-----Original Message----- 
From: jayesh at mcs.anl.gov [ mailto:jayesh at mcs.anl.gov ] 
Sent: Friday, April 02, 2010 8:07 AM 
To: mpich-discuss at mcs.anl.gov 
Cc: Bakkal, Alex 
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi, 
Couple of suggestions, 

# Does it work if you use the same command (same program) for all
instances of the spawned procs ? 
# Did you try setting the "path" infos for the commands ? 

You can find some examples at 
https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/test/mpi/spawn (Look at 
spawnminfo1.c) . Let us know the results. 

Regards, 
Jayesh 
----- Original Message ----- 
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com> 
To: mpich-discuss at mcs.anl.gov 
Sent: Thursday, April 1, 2010 4:58:07 PM GMT -06:00 US/Canada Central 
Subject: [mpich-discuss] MPI_Comm_spawn_multiple() fails 



Hello, 

I have installed MPICH2 v.1.2.1 on Windows XP Pro machines and was
trying to run the following test app on "mt" channel: 



#define NHOST 2 

int main( int argc, char * argv[]) { 

int supported; 

int rank, size; 

MPI_Comm parent, intercomm; 



MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &supported); 

if (supported != MPI_THREAD_MULTIPLE){ 

printf( "The library does not support MPI_THREAD_MULTIPLE\n" ); 

exit(-1); 

} 

MPI_Comm_get_parent(&parent); 

if (parent == MPI_COMM_NULL){ 

int i; 

int nproc[NHOST] = {1 , 1 }; 

char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\tm\\mpi_spawn" }; 



MPI_Info infos[NHOST]; 



for (i=0; i < NHOST; i++) { 

MPI_Info_create(&infos[i]); 

MPI_Info_set(infos[i], "host" , "localhost" ); 

} 

MPI_Comm_spawn_multiple(NHOST, 

progs, MPI_ARGVS_NULL, nproc, infos, 

0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE); 

for (i=0; i < NHOST; i++) { 

MPI_Info_free(&infos[i]); 

} 

} 

else { 

intercomm = parent; 

} 

MPI_Comm_rank(intercomm, &rank); 

MPI_Comm_size(intercomm, &size); 

printf( "[%d/%d] Hello world\n" , rank, size); fflush(stdout); 

MPI_Comm_free(&intercomm); 

MPI_Finalize(); 

return 0; 

} 

The app hangs. I can see three (as expected) instances of the program in
the Task Manager but MPI_Comm_spawn_multiple() never returns control to
my app. 

I have reduced the number of spawned hosts to one and it ran
successfully. Also, it spawned successfully two hosts with the following
line commented out: 

//MPI_Info_set(infos[i], "host" , "localhost" ); 

I have tried it on two machines with the same result. 



Also, I have tried to spawn just one instance but on the remote host and
got the following error messages: 

ERROR:unable to read the cmd header on the pmi context, Error = -1 

ERROR:Error posting ready, An existing connection was forcibly closed by
the remote host. 

I am not sure whether that is related to the first issue? 



Any insight would be greately appreciated. 

Thank you. 

Alex 

************************************************************************

****************** 
This message may contain confidential or proprietary information
intended only for the use of the 
addressee(s) named above or may contain information that is legally
privileged. If you are not the intended addressee, or the person
responsible for delivering it to the intended addressee, you are hereby
notified that reading, disseminating, distributing or copying this
message is strictly prohibited. If you have received this message by
mistake, please immediately notify us by replying to the message and
delete the original message and any copies immediately thereafter. 

Thank you. 
************************************************************************

****************** 
FACLD 
_______________________________________________ 
mpich-discuss mailing list 
mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 
_______________________________________________ 
mpich-discuss mailing list 
mpich-discuss at mcs.anl.gov 
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list