[mpich-discuss] mpiexec kills the remote login shell

Jayesh Krishna jayesh at mcs.anl.gov
Wed Feb 4 08:59:15 CST 2009


I meant Korebot... :)

  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jayesh Krishna
Sent: Wednesday, February 04, 2009 8:56 AM
To: 'Yu-Cheng Chou'
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpiexec kills the remote login shell


Hi,
  The mpiexec output shows the following error when running hellow,
==================
Unable to exec 'hello' on korebot 

Error 2 - No such file or directory

==================

  Please provide the debug output of smpd (smpd -d 2>&1 | tee smpd.out)
along with mpiexec (mpiexec -verbose -n 2 ./hellow 2>&1 | tee
mpiexec.out).
 
#  Can you run simple C programs (without using mpiexec) on Korbet ?
#  Is the ssh connection aborted when you run non-MPI programs (mpiexec -n
2 hostname) ?
#  Can you send us your ".smpd" config file ?
#  Did you modify the MPICH2 code to run on Korbet (Please send us your
configure command & any env settings set to configure/make MPICH2)? 
 
Regards,
Jayesh
 
  _____  

From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jayesh Krishna
Sent: Wednesday, February 04, 2009 8:41 AM
To: 'Yu-Cheng Chou'
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpiexec kills the remote login shell



 Hi,
  I will take a look at the debug logs and get back to you. Meanwhile, can
you run simple C programs without using mpiexec on Korbet ?
  MPICH2 currently does not support heterogeneous systems (So you won't be
able to run your MPI job across ARM & other architectures).

Regards,
Jayesh

-----Original Message-----
From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
Sent: Tuesday, February 03, 2009 7:52 PM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpiexec kills the remote login shell

> # Can you run non-MPI programs using mpiexec (mpiexec -n 2 hostname) ?
Yes.

> # Can you compile and run the hello world program (examples/hellow.c)
> provided with MPICH2 (mpiexec -n 2 ./hellow)?
Yes.

> # How did you start smpd (the command used to start smpd) ? How did
> you run your MPI job (the command used to run your job)?
I have a ".smpd" file containing one line of information, which is
"phrase=123".
Thus, I started smpd using "smpd -s".
Then I used "mpiexec -n 1 hellow" to run hellow on Korebot.

> # How did you find that mpiexec kills the sshd process (We typically
> ssh to unix machines and run mpiexec without any problems) ?
I logged in Korebot with two terminals.
>From #1 terminal, I checked all the processes running on Korebot.
>From #2 terminal, I started smpd and run hellow using the commands
mentioned above.
After hellow was finished, the connection to Korebot via #2 terminal was
closed.
>From #1 terminal, I knew that the sshd process associated with #2
terminal was gone.

>  Can you run smpd/mpiexec in debug mode and provide us with the
> outputs (smpd -d / mpiexec -n 2 -verbose hostname) ?
The first attached text file is the output from running hellow in
mpiexec's verbose mode.


There is another issue.
This time, I used two machines. One is Korebot as mentioned above, and the
other is a laptop running Ubuntu Linux OS.
I started smpd with the same ".smpd" file and command as mentioned above
both on Korebot and the lap top.
There is a machine file called "hostfile" on Korebot. The file contains
the following information about the name of the two machines.

korebot
shrimp

Then from Korebot, I ran cpi using the following command.

mpiexec -machinefile ./hostfile -verbose -n 2 cpi


But the value of pi is a huge number. I think it is related to "double
type variables" being transferred between processes running on an
ARM-based Linux and a general Linux machines.

The second attached text file is the output from running cpi in mpiexec's
verbose mode.


>
> I am cross-compiling mpich2-1.0.8 with smpd for Khepera III mobile
robot.
>
> This mobile robot has a Korebot board which is an ARM-based computer
> with a Linux operating system.
>
> The cross-compilation was fine.
>
> Firstly, I logged in to Korebot through ssh.
> Secondly, I started smpd.
> Thirdly, I ran mpiexec to execute an MPI program (cpi) that comes with
> the package.
>
> The result was correct, but when mpiexec was finished, the ssh
> connection to the Korebot was closed.
> I found that mpiexec kills the sshd process through which I was
> remotely connected to Korebot.
>
> I've been looking for the cause, but still have not found any clues.
>
> Could you give me any ideas to solve this problem?
>
> Thank you,
>
> Yu-Cheng
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090204/26f22450/attachment-0001.htm>


More information about the mpich-discuss mailing list