[mpich-discuss] mpiexec kills the remote login shell
Jayesh Krishna
jayesh at mcs.anl.gov
Wed Feb 4 13:04:00 CST 2009
Hi,
Can you also attach the corresponding smpd debug output ?
Regards,
Jayesh
-----Original Message-----
From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
Sent: Wednesday, February 04, 2009 1:02 PM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
Hi,
Firstly, the previously attached mpiexec verbose output is a wrong one.
I've attached the correct one to this email.
Secondly, I want to point out that as long as mpiexec is initiated from
Korebot to run a program, no matter it's a MPI or non-MPI program, no
matter the program can be found or not, as soon as mpiexec is finished,
the ssh connection to Korebot will be gone.
Thank you
> Hi,
> The mpiexec output shows the following error when running hellow,
> ==================
>
> Unable to exec 'hello' on korebot
>
> Error 2 - No such file or directory
>
> ==================
>
> Please provide the debug output of smpd (smpd -d 2>&1 | tee
> smpd.out) along with mpiexec (mpiexec -verbose -n 2 ./hellow 2>&1 |
> tee mpiexec.out).
>
> # Can you run simple C programs (without using mpiexec) on Korbet ?
> # Is the ssh connection aborted when you run non-MPI programs
> (mpiexec -n 2
> hostname) ?
> # Can you send us your ".smpd" config file ?
> # Did you modify the MPICH2 code to run on Korbet (Please send us
> your configure command & any env settings set to configure/make MPICH2)?
>
> Regards,
> Jayesh
>
> ________________________________
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jayesh Krishna
> Sent: Wednesday, February 04, 2009 8:41 AM
> To: 'Yu-Cheng Chou'
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>
> Hi,
> I will take a look at the debug logs and get back to you. Meanwhile,
> can you run simple C programs without using mpiexec on Korbet ?
> MPICH2 currently does not support heterogeneous systems (So you
> won't be able to run your MPI job across ARM & other architectures).
>
> Regards,
> Jayesh
>
> -----Original Message-----
> From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
> Sent: Tuesday, February 03, 2009 7:52 PM
> To: Jayesh Krishna
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>
>> # Can you run non-MPI programs using mpiexec (mpiexec -n 2 hostname) ?
> Yes.
>
>> # Can you compile and run the hello world program (examples/hellow.c)
>> provided with MPICH2 (mpiexec -n 2 ./hellow)?
> Yes.
>
>> # How did you start smpd (the command used to start smpd) ? How did
>> you run your MPI job (the command used to run your job)?
> I have a ".smpd" file containing one line of information, which is
> "phrase=123".
> Thus, I started smpd using "smpd -s".
> Then I used "mpiexec -n 1 hellow" to run hellow on Korebot.
>
>> # How did you find that mpiexec kills the sshd process (We typically
>> ssh to unix machines and run mpiexec without any problems) ?
> I logged in Korebot with two terminals.
> >From #1 terminal, I checked all the processes running on Korebot.
> >From #2 terminal, I started smpd and run hellow using the commands
> mentioned above.
> After hellow was finished, the connection to Korebot via #2 terminal
> was closed.
> >From #1 terminal, I knew that the sshd process associated with #2
> >terminal
> was gone.
>
>> Can you run smpd/mpiexec in debug mode and provide us with the
>> outputs (smpd -d / mpiexec -n 2 -verbose hostname) ?
> The first attached text file is the output from running hellow in
> mpiexec's verbose mode.
>
>
> There is another issue.
> This time, I used two machines. One is Korebot as mentioned above, and
> the other is a laptop running Ubuntu Linux OS.
> I started smpd with the same ".smpd" file and command as mentioned
> above both on Korebot and the lap top.
> There is a machine file called "hostfile" on Korebot. The file
> contains the following information about the name of the two machines.
>
> korebot
> shrimp
>
> Then from Korebot, I ran cpi using the following command.
>
> mpiexec -machinefile ./hostfile -verbose -n 2 cpi
>
>
> But the value of pi is a huge number. I think it is related to "double
> type variables" being transferred between processes running on an
> ARM-based Linux and a general Linux machines.
>
> The second attached text file is the output from running cpi in
> mpiexec's verbose mode.
>
>
>>
>> I am cross-compiling mpich2-1.0.8 with smpd for Khepera III mobile
robot.
>>
>> This mobile robot has a Korebot board which is an ARM-based computer
>> with a Linux operating system.
>>
>> The cross-compilation was fine.
>>
>> Firstly, I logged in to Korebot through ssh.
>> Secondly, I started smpd.
>> Thirdly, I ran mpiexec to execute an MPI program (cpi) that comes
>> with the package.
>>
>> The result was correct, but when mpiexec was finished, the ssh
>> connection to the Korebot was closed.
>> I found that mpiexec kills the sshd process through which I was
>> remotely connected to Korebot.
>>
>> I've been looking for the cause, but still have not found any clues.
>>
>> Could you give me any ideas to solve this problem?
>>
>> Thank you,
>>
>> Yu-Cheng
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090204/8331a892/attachment.htm>
More information about the mpich-discuss
mailing list