[mpich-discuss] mpiexec kills the remote login shell

Jayesh Krishna jayesh at mcs.anl.gov
Wed Feb 4 14:05:36 CST 2009


 Hi,
  Does smpd abort when you run your MPI job ?

Regards,
Jayesh

-----Original Message-----
From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu] 
Sent: Wednesday, February 04, 2009 1:56 PM
To: Jayesh Krishna
Cc: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] mpiexec kills the remote login shell

Hi

I can cross-compile the program and then simply run the executable on
Korebot with no errors.


> Hi,
>  Can you try running (without mpiexec) a simple C program with 
> exit(-1) on Korebot ?
>
> ========================================
> #include <stdlib.h>
> int main(int argc, char *argv[])
> {
>     exit(-1);
> }
> ========================================
>
> Regards,
> Jayesh
> ________________________________
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jayesh Krishna
> Sent: Wednesday, February 04, 2009 1:04 PM
> To: 'Yu-Cheng Chou'
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>
>  Hi,
>   Can you also attach the corresponding smpd debug output ?
>
> Regards,
> Jayesh
>
> -----Original Message-----
> From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
> Sent: Wednesday, February 04, 2009 1:02 PM
> To: Jayesh Krishna
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>
> Hi,
>
> Firstly, the previously attached mpiexec verbose output is a wrong one.
> I've attached the correct one to this email.
>
> Secondly, I want to point out that as long as mpiexec is initiated 
> from Korebot to run a program, no matter it's a MPI or non-MPI 
> program, no matter the program can be found or not, as soon as mpiexec 
> is finished, the ssh connection to Korebot will be gone.
>
> Thank you
>
>
>> Hi,
>>   The mpiexec output shows the following error when running hellow, 
>> ==================
>>
>> Unable to exec 'hello' on korebot
>>
>> Error 2 - No such file or directory
>>
>> ==================
>>
>>   Please provide the debug output of smpd (smpd -d 2>&1 | tee
>> smpd.out) along with mpiexec (mpiexec -verbose -n 2 ./hellow 2>&1 | 
>> tee mpiexec.out).
>>
>> #  Can you run simple C programs (without using mpiexec) on Korbet ?
>> #  Is the ssh connection aborted when you run non-MPI programs 
>> (mpiexec -n 2
>> hostname) ?
>> #  Can you send us your ".smpd" config file ?
>> #  Did you modify the MPICH2 code to run on Korbet (Please send us 
>> your configure command & any env settings set to configure/make
MPICH2)?
>>
>> Regards,
>> Jayesh
>>
>> ________________________________
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jayesh 
>> Krishna
>> Sent: Wednesday, February 04, 2009 8:41 AM
>> To: 'Yu-Cheng Chou'
>> Cc: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>>
>>  Hi,
>>   I will take a look at the debug logs and get back to you. 
>> Meanwhile, can you run simple C programs without using mpiexec on
Korbet ?
>>   MPICH2 currently does not support heterogeneous systems (So you 
>> won't be able to run your MPI job across ARM & other architectures).
>>
>> Regards,
>> Jayesh
>>
>> -----Original Message-----
>> From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
>> Sent: Tuesday, February 03, 2009 7:52 PM
>> To: Jayesh Krishna
>> Cc: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>>
>>> # Can you run non-MPI programs using mpiexec (mpiexec -n 2 hostname) ?
>> Yes.
>>
>>> # Can you compile and run the hello world program 
>>> (examples/hellow.c) provided with MPICH2 (mpiexec -n 2 ./hellow)?
>> Yes.
>>
>>> # How did you start smpd (the command used to start smpd) ? How did 
>>> you run your MPI job (the command used to run your job)?
>> I have a ".smpd" file containing one line of information, which is 
>> "phrase=123".
>> Thus, I started smpd using "smpd -s".
>> Then I used "mpiexec -n 1 hellow" to run hellow on Korebot.
>>
>>> # How did you find that mpiexec kills the sshd process (We typically 
>>> ssh to unix machines and run mpiexec without any problems) ?
>> I logged in Korebot with two terminals.
>> >From #1 terminal, I checked all the processes running on Korebot.
>> >From #2 terminal, I started smpd and run hellow using the commands
>> mentioned above.
>> After hellow was finished, the connection to Korebot via #2 terminal 
>> was closed.
>> >From #1 terminal, I knew that the sshd process associated with #2 
>> >terminal
>> was gone.
>>
>>>  Can you run smpd/mpiexec in debug mode and provide us with the 
>>> outputs (smpd -d / mpiexec -n 2 -verbose hostname) ?
>> The first attached text file is the output from running hellow in 
>> mpiexec's verbose mode.
>>
>>
>> There is another issue.
>> This time, I used two machines. One is Korebot as mentioned above, 
>> and the other is a laptop running Ubuntu Linux OS.
>> I started smpd with the same ".smpd" file and command as mentioned 
>> above both on Korebot and the lap top.
>> There is a machine file called "hostfile" on Korebot. The file 
>> contains the following information about the name of the two machines.
>>
>> korebot
>> shrimp
>>
>> Then from Korebot, I ran cpi using the following command.
>>
>> mpiexec -machinefile ./hostfile -verbose -n 2 cpi
>>
>>
>> But the value of pi is a huge number. I think it is related to 
>> "double type variables" being transferred between processes running 
>> on an ARM-based Linux and a general Linux machines.
>>
>> The second attached text file is the output from running cpi in 
>> mpiexec's verbose mode.
>>
>>
>>>
>>> I am cross-compiling mpich2-1.0.8 with smpd for Khepera III mobile
robot.
>>>
>>> This mobile robot has a Korebot board which is an ARM-based computer 
>>> with a Linux operating system.
>>>
>>> The cross-compilation was fine.
>>>
>>> Firstly, I logged in to Korebot through ssh.
>>> Secondly, I started smpd.
>>> Thirdly, I ran mpiexec to execute an MPI program (cpi) that comes 
>>> with the package.
>>>
>>> The result was correct, but when mpiexec was finished, the ssh 
>>> connection to the Korebot was closed.
>>> I found that mpiexec kills the sshd process through which I was 
>>> remotely connected to Korebot.
>>>
>>> I've been looking for the cause, but still have not found any clues.
>>>
>>> Could you give me any ideas to solve this problem?
>>>
>>> Thank you,
>>>
>>> Yu-Cheng
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090204/51f1d67d/attachment.htm>


More information about the mpich-discuss mailing list