[mpich-discuss] MPI_Comm_spawn_multiple() fails

Jayesh Krishna jayesh at mcs.anl.gov
Tue Apr 6 17:17:35 CDT 2010


Hi,
 Also, if the suggestion below does not work send us the new smpd and mpiexec logs (see my earlier mail on how to create these logs) so that we can debug the problem further.

Regards,
Jayesh
----- Original Message -----
From: "Jayesh Krishna" <jayesh at mcs.anl.gov>
To: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
Cc: mpich-discuss at mcs.anl.gov
Sent: Tuesday, April 6, 2010 5:10:27 PM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 I looked into your logs again and found that you are launching "c:\mpi_spawn.exe" from "D:\Projects\mpi_spawn\Release\mpi_spawn.exe" . Can you try launching the same program (progs[0],progs[1] in program is "D:\\projects\\mpi_spawn\\release\mpi_spawn.exe" instead of "c:\\mpi_spawn.exe") and let us know if it works ?

(PS: It is possible that c:\mpi_spawn.exe was not recompiled after you installed a newer version of MPICH2.)
Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: jayesh at mcs.anl.gov
Cc: mpich-discuss at mcs.anl.gov
Sent: Tuesday, April 6, 2010 4:17:42 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,

Unfortunately, neither 1) nor 2) makes the difference.

Alex 

-----Original Message-----
From: jayesh at mcs.anl.gov [mailto:jayesh at mcs.anl.gov] 
Sent: Tuesday, April 06, 2010 1:37 PM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 Can you try setting MPICH_INTERFACE_HOSTNAME to localhost before
running your code and see if it works (I can see that this var is set to
some invalid value in your logs - I cannot reproduce it here in the lab
though. This could be a bug in MPICH2)?
 Try the following,

1) mpiexec -n 1 -channel mt -env MPICH_INTERFACE_HOSTNAME "localhost"
mpi_spawn.exe

2) set MPICH_INTERFACE_HOSTNAME=localhost
   mpi_spawn.exe

 It looks like since this var is set to an invalid value MPI processes
are not able to connect to each other (hang).

Regards,
Jayesh

----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: jayesh at mcs.anl.gov
Cc: mpich-discuss at mcs.anl.gov
Sent: Tuesday, April 6, 2010 3:18:58 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,

I am a novice in MPI. Never used any flavor before. Providing full path
to mpiexec does not change anything. It still hangs.

Thank you.

Alex 

-----Original Message-----
From: jayesh at mcs.anl.gov [mailto:jayesh at mcs.anl.gov]
Sent: Tuesday, April 06, 2010 1:06 PM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 Do you have any other flavor of MPI installed (eg: MSMPI) in your
machine ? Do you have multiple versions of MPICH2 installed on the
machine ?
 Can you try running your job by specifying the complete path to the
mpiexec command (c:\progra~1\mpich2\bin\mpiexec -n 1 -channel mt
mpi_spawn.exe)?

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Tuesday, April 6, 2010 2:50:57 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,

The only variable that I have set on a global level is
MPICH2_CHANNEL=mt. I am compiling  with VS 2008 SP1.

Thank you.

Alex 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: Tuesday, April 06, 2010 12:48 PM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 Are you setting any MPICH specific environment before running your
program (eg: MPICH_INTERFACE_HOSTNAME)?
 How are you compiling your code (Visual Studio / Cygwin gcc ...) ?

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Tuesday, April 6, 2010 2:19:13 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,

I do realize that it is strange. It is even stranger that is works
without setting "path" key on MPI_Info but I am getting the same result
on two computers. Anyway, the log files are attached.

Thank you.

Alex 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: Tuesday, April 06, 2010 11:29 AM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 Hmmm... This is strange. I just compiled and ran your code (mpiexec -n
1 -channel mt mpi_spawn.exe) on my XP machine and it ran without a hang
(printed the hello mesgs and exited normally).
 Did you recompile your code after installing a newer version of MPICH2
? Try re-compiling your code and see if it works.
 If that does not work, please do the following,

1) Stop any instances of the process manager, smpd.exe, running in your
system (smpd -stop)
2) Start smpd in the debug mode and redirect output to a log file (smpd
-d > smpd.log)
3) Launch your job by running mpiexec in the verbose mode and redirect
output to a log file (mpiexec -verbose -n 1 -channel mt mpi_spawn.exe >
mpiexec.log)
   If the application hangs wait for sometime before killing (Ctrl-C)
the mpiexec command - to give time for stdout to be written to the log
file.
4) Provide us with the log files
5) You can stop the debug session of smpd by Ctrl-C and start smpd using
the "smpd -start" command.

 Let us know the results.

Regards,
Jayesh
 
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>, mpich-discuss at mcs.anl.gov
Sent: Monday, April 5, 2010 2:42:25 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,

With "-channel mt" switch it behaves exactly as with direct call to
mpi_spawn.exe: it hangs - three instances (one calling and two spawned)
mpi_spawn.exe seen in the Task Manager.

Alex 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: Monday, April 05, 2010 12:30 PM
To: mpich-discuss at mcs.anl.gov
Cc: Bakkal, Alex
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails

Hi,
 Since your program requires threaded support try using the
multithreaded channel (mpiexec -n 1 -channel mt mpi_spawn.exe).

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: mpich-discuss at mcs.anl.gov
Sent: Monday, April 5, 2010 12:27:00 PM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails



Nope. Screen output: 






-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov [
mailto:mpich-discuss-bounces at mcs.anl.gov ] On Behalf Of Jayesh Krishna
Sent: Monday, April 05, 2010 10:17 AM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi,
Does it work if you launch it using mpiexec (mpiexec -n 1
mpi_spawn.exe)? 

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Monday, April 5, 2010 12:02:13 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi, 

The mpi_spawn.cpp is attached. I am calling my app directly: 

D:\Projects\mpi_spawn\Release>mpi_spawn.exe 

Thank you for your help. 

Alex 

-----Original Message-----
From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ]
Sent: Monday, April 05, 2010 7:16 AM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi,
Can you send us the modified test program (attach it in your email) ? 
Also let us know how you ran the test (the complete mpiexec command). 

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Friday, April 2, 2010 7:22:21 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

I am very sorry for misleading you in my previous response. The app does

NOT run with
char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\mpi_spawn" }; 

It hangs but at least it spawned two instances of mpi_spawn.exe 

-----Original Message-----
From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ]
Sent: Friday, April 02, 2010 3:20 PM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 


Does, 

char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\mpi_spawn" }; 

work for you ? 

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: mpich-discuss at mcs.anl.gov
Sent: Friday, April 2, 2010 4:04:11 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Yes, spawning one instance (if you mean that by single program) works
fine but spawning two instances of the same program (even located in two
separate folders) hangs. 

char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\tm\\mpi_spawn" }; 

Thank you. 

Alex 

-----Original Message-----
From: Jayesh Krishna [ mailto:jayesh at mcs.anl.gov ]
Sent: Friday, April 02, 2010 1:59 PM
To: Bakkal, Alex
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi,
I tried spawn() with the host info set to "localhost" and it worked fine
for me (However I spawned using a single program. Does that work for you
?). 

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: jayesh at mcs.anl.gov, mpich-discuss at mcs.anl.gov
Sent: Friday, April 2, 2010 3:24:21 PM GMT -06:00 US/Canada Central
Subject: RE: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi Jayesh, 

Thank you for response. My test app, as well as the examples that you
have referred me to, works fine locally until I attempt to set "host" 
key in MPI_Info. Unfortunately, I need that key in order to spawn
remotely. That is my ultimate goal. 

Thank you anyway. 

Alex 

-----Original Message-----
From: jayesh at mcs.anl.gov [ mailto:jayesh at mcs.anl.gov ]
Sent: Friday, April 02, 2010 8:07 AM
To: mpich-discuss at mcs.anl.gov
Cc: Bakkal, Alex
Subject: Re: [mpich-discuss] MPI_Comm_spawn_multiple() fails 

Hi,
Couple of suggestions, 

# Does it work if you use the same command (same program) for all
instances of the spawned procs ? 
# Did you try setting the "path" infos for the commands ? 

You can find some examples at
https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/test/mpi/spawn (Look at
spawnminfo1.c) . Let us know the results. 

Regards,
Jayesh
----- Original Message -----
From: "Alex Bakkal" <Abakkal at LOANPERFORMANCE.com>
To: mpich-discuss at mcs.anl.gov
Sent: Thursday, April 1, 2010 4:58:07 PM GMT -06:00 US/Canada Central
Subject: [mpich-discuss] MPI_Comm_spawn_multiple() fails 



Hello, 

I have installed MPICH2 v.1.2.1 on Windows XP Pro machines and was
trying to run the following test app on "mt" channel: 



#define NHOST 2 

int main( int argc, char * argv[]) { 

int supported; 

int rank, size; 

MPI_Comm parent, intercomm; 



MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &supported); 

if (supported != MPI_THREAD_MULTIPLE){ 

printf( "The library does not support MPI_THREAD_MULTIPLE\n" ); 

exit(-1); 

} 

MPI_Comm_get_parent(&parent); 

if (parent == MPI_COMM_NULL){ 

int i; 

int nproc[NHOST] = {1 , 1 }; 

char * progs[NHOST] = { "c:\\mpi_spawn" , "c:\\tm\\mpi_spawn" }; 



MPI_Info infos[NHOST]; 



for (i=0; i < NHOST; i++) { 

MPI_Info_create(&infos[i]); 

MPI_Info_set(infos[i], "host" , "localhost" ); 

} 

MPI_Comm_spawn_multiple(NHOST, 

progs, MPI_ARGVS_NULL, nproc, infos, 

0, MPI_COMM_WORLD, &intercomm, MPI_ERRCODES_IGNORE); 

for (i=0; i < NHOST; i++) { 

MPI_Info_free(&infos[i]); 

} 

} 

else { 

intercomm = parent; 

} 

MPI_Comm_rank(intercomm, &rank); 

MPI_Comm_size(intercomm, &size); 

printf( "[%d/%d] Hello world\n" , rank, size); fflush(stdout); 

MPI_Comm_free(&intercomm); 

MPI_Finalize(); 

return 0; 

} 

The app hangs. I can see three (as expected) instances of the program in
the Task Manager but MPI_Comm_spawn_multiple() never returns control to
my app. 

I have reduced the number of spawned hosts to one and it ran
successfully. Also, it spawned successfully two hosts with the following
line commented out: 

//MPI_Info_set(infos[i], "host" , "localhost" ); 

I have tried it on two machines with the same result. 



Also, I have tried to spawn just one instance but on the remote host and
got the following error messages: 

ERROR:unable to read the cmd header on the pmi context, Error = -1 

ERROR:Error posting ready, An existing connection was forcibly closed by
the remote host. 

I am not sure whether that is related to the first issue? 



Any insight would be greately appreciated. 

Thank you. 

Alex 

************************************************************************

******************
This message may contain confidential or proprietary information
intended only for the use of the
addressee(s) named above or may contain information that is legally
privileged. If you are not the intended addressee, or the person
responsible for delivering it to the intended addressee, you are hereby
notified that reading, disseminating, distributing or copying this
message is strictly prohibited. If you have received this message by
mistake, please immediately notify us by replying to the message and
delete the original message and any copies immediately thereafter. 

Thank you. 
************************************************************************

******************
FACLD
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list