<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>RE: [mpich-discuss] mpiexec kills the remote login shell</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16735" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009>Hi,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009> Can you try running (without mpiexec) a simple C
program with exit(-1) on Korebot ?</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009>========================================</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009>#include <stdlib.h></SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=921291219-04022009>int
main(int argc, char *argv[])<BR>{<BR> exit(-1);<BR>}<BR><FONT
face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009>========================================</SPAN></FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=921291219-04022009><FONT
face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009></SPAN></FONT></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=921291219-04022009><FONT
face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009>Regards,</SPAN></FONT></SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=921291219-04022009><FONT
face=Arial color=#0000ff size=2><SPAN
class=921291219-04022009>Jayesh</SPAN></FONT></DIV></SPAN></FONT><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> mpich-discuss-bounces@mcs.anl.gov
[mailto:mpich-discuss-bounces@mcs.anl.gov] <B>On Behalf Of </B>Jayesh
Krishna<BR><B>Sent:</B> Wednesday, February 04, 2009 1:04 PM<BR><B>To:</B>
'Yu-Cheng Chou'<BR><B>Cc:</B> mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> Re:
[mpich-discuss] mpiexec kills the remote login shell<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/plain format -->
<P><FONT size=2> Hi,<BR> Can you also attach the corresponding smpd
debug output ?<BR><BR>Regards,<BR>Jayesh<BR><BR>-----Original
Message-----<BR>From: Yu-Cheng Chou [<A
href="mailto:cycchou@ucdavis.edu">mailto:cycchou@ucdavis.edu</A>]<BR>Sent:
Wednesday, February 04, 2009 1:02 PM<BR>To: Jayesh Krishna<BR>Cc:
mpich-discuss@mcs.anl.gov<BR>Subject: Re: [mpich-discuss] mpiexec kills the
remote login shell<BR><BR>Hi,<BR><BR>Firstly, the previously attached mpiexec
verbose output is a wrong one.<BR>I've attached the correct one to this
email.<BR><BR>Secondly, I want to point out that as long as mpiexec is initiated
from Korebot to run a program, no matter it's a MPI or non-MPI program, no
matter the program can be found or not, as soon as mpiexec is finished, the ssh
connection to Korebot will be gone.<BR><BR>Thank you<BR><BR><BR>>
Hi,<BR>> The mpiexec output shows the following error when
running hellow,<BR>> ==================<BR>><BR>> Unable to exec
'hello' on korebot<BR>><BR>> Error 2 - No such file or
directory<BR>><BR>> ==================<BR>><BR>> Please
provide the debug output of smpd (smpd -d 2>&1 | tee<BR>> smpd.out)
along with mpiexec (mpiexec -verbose -n 2 ./hellow 2>&1 |<BR>> tee
mpiexec.out).<BR>><BR>> # Can you run simple C programs (without
using mpiexec) on Korbet ?<BR>> # Is the ssh connection aborted when
you run non-MPI programs<BR>> (mpiexec -n 2<BR>> hostname) ?<BR>>
# Can you send us your ".smpd" config file ?<BR>> # Did you
modify the MPICH2 code to run on Korbet (Please send us<BR>> your configure
command & any env settings set to configure/make MPICH2)?<BR>><BR>>
Regards,<BR>> Jayesh<BR>><BR>> ________________________________<BR>>
From: mpich-discuss-bounces@mcs.anl.gov<BR>> [<A
href="mailto:mpich-discuss-bounces@mcs.anl.gov">mailto:mpich-discuss-bounces@mcs.anl.gov</A>]
On Behalf Of Jayesh Krishna<BR>> Sent: Wednesday, February 04, 2009 8:41
AM<BR>> To: 'Yu-Cheng Chou'<BR>> Cc: mpich-discuss@mcs.anl.gov<BR>>
Subject: Re: [mpich-discuss] mpiexec kills the remote login
shell<BR>><BR>> Hi,<BR>> I will take a look at the
debug logs and get back to you. Meanwhile,<BR>> can you run simple C programs
without using mpiexec on Korbet ?<BR>> MPICH2 currently does not
support heterogeneous systems (So you<BR>> won't be able to run your MPI job
across ARM & other architectures).<BR>><BR>> Regards,<BR>>
Jayesh<BR>><BR>> -----Original Message-----<BR>> From: Yu-Cheng Chou
[<A href="mailto:cycchou@ucdavis.edu">mailto:cycchou@ucdavis.edu</A>]<BR>>
Sent: Tuesday, February 03, 2009 7:52 PM<BR>> To: Jayesh Krishna<BR>> Cc:
mpich-discuss@mcs.anl.gov<BR>> Subject: Re: [mpich-discuss] mpiexec kills the
remote login shell<BR>><BR>>> # Can you run non-MPI programs using
mpiexec (mpiexec -n 2 hostname) ?<BR>> Yes.<BR>><BR>>> # Can you
compile and run the hello world program (examples/hellow.c)<BR>>> provided
with MPICH2 (mpiexec -n 2 ./hellow)?<BR>> Yes.<BR>><BR>>> # How did
you start smpd (the command used to start smpd) ? How did<BR>>> you run
your MPI job (the command used to run your job)?<BR>> I have a ".smpd" file
containing one line of information, which is<BR>> "phrase=123".<BR>> Thus,
I started smpd using "smpd -s".<BR>> Then I used "mpiexec -n 1 hellow" to run
hellow on Korebot.<BR>><BR>>> # How did you find that mpiexec kills the
sshd process (We typically<BR>>> ssh to unix machines and run mpiexec
without any problems) ?<BR>> I logged in Korebot with two terminals.<BR>>
>From #1 terminal, I checked all the processes running on Korebot.<BR>>
>From #2 terminal, I started smpd and run hellow using the commands<BR>>
mentioned above.<BR>> After hellow was finished, the connection to Korebot
via #2 terminal<BR>> was closed.<BR>> >From #1 terminal, I knew that
the sshd process associated with #2<BR>> >terminal<BR>> was
gone.<BR>><BR>>> Can you run smpd/mpiexec in debug mode and
provide us with the<BR>>> outputs (smpd -d / mpiexec -n 2 -verbose
hostname) ?<BR>> The first attached text file is the output from running
hellow in<BR>> mpiexec's verbose mode.<BR>><BR>><BR>> There is
another issue.<BR>> This time, I used two machines. One is Korebot as
mentioned above, and<BR>> the other is a laptop running Ubuntu Linux
OS.<BR>> I started smpd with the same ".smpd" file and command as
mentioned<BR>> above both on Korebot and the lap top.<BR>> There is a
machine file called "hostfile" on Korebot. The file<BR>> contains the
following information about the name of the two machines.<BR>><BR>>
korebot<BR>> shrimp<BR>><BR>> Then from Korebot, I ran cpi using the
following command.<BR>><BR>> mpiexec -machinefile ./hostfile -verbose -n 2
cpi<BR>><BR>><BR>> But the value of pi is a huge number. I think it is
related to "double<BR>> type variables" being transferred between processes
running on an<BR>> ARM-based Linux and a general Linux
machines.<BR>><BR>> The second attached text file is the output from
running cpi in<BR>> mpiexec's verbose
mode.<BR>><BR>><BR>>><BR>>> I am cross-compiling mpich2-1.0.8
with smpd for Khepera III mobile robot.<BR>>><BR>>> This mobile
robot has a Korebot board which is an ARM-based computer<BR>>> with a
Linux operating system.<BR>>><BR>>> The cross-compilation was
fine.<BR>>><BR>>> Firstly, I logged in to Korebot through
ssh.<BR>>> Secondly, I started smpd.<BR>>> Thirdly, I ran mpiexec to
execute an MPI program (cpi) that comes<BR>>> with the
package.<BR>>><BR>>> The result was correct, but when mpiexec was
finished, the ssh<BR>>> connection to the Korebot was closed.<BR>>>
I found that mpiexec kills the sshd process through which I was<BR>>>
remotely connected to Korebot.<BR>>><BR>>> I've been looking for the
cause, but still have not found any clues.<BR>>><BR>>> Could you
give me any ideas to solve this problem?<BR>>><BR>>> Thank
you,<BR>>><BR>>>
Yu-Cheng<BR>>><BR>><BR></FONT></P></BODY></HTML>