[mpich-discuss] mpiexec kills the remote login shell

Yu-Cheng Chou cycchou at ucdavis.edu
Wed Feb 4 14:32:14 CST 2009


>  Hi,
>   Does smpd abort when you run your MPI job ?

No.

>
> Regards,
> Jayesh
>
> -----Original Message-----
> From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
> Sent: Wednesday, February 04, 2009 1:56 PM
> To: Jayesh Krishna
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>
> Hi
>
> I can cross-compile the program and then simply run the executable on
> Korebot with no errors.
>
>
>> Hi,
>>  Can you try running (without mpiexec) a simple C program with
>> exit(-1) on Korebot ?
>>
>> ========================================
>> #include <stdlib.h>
>> int main(int argc, char *argv[])
>> {
>>     exit(-1);
>> }
>> ========================================
>>
>> Regards,
>> Jayesh
>> ________________________________
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jayesh Krishna
>> Sent: Wednesday, February 04, 2009 1:04 PM
>> To: 'Yu-Cheng Chou'
>> Cc: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>>
>>  Hi,
>>   Can you also attach the corresponding smpd debug output ?
>>
>> Regards,
>> Jayesh
>>
>> -----Original Message-----
>> From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
>> Sent: Wednesday, February 04, 2009 1:02 PM
>> To: Jayesh Krishna
>> Cc: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>>
>> Hi,
>>
>> Firstly, the previously attached mpiexec verbose output is a wrong one.
>> I've attached the correct one to this email.
>>
>> Secondly, I want to point out that as long as mpiexec is initiated
>> from Korebot to run a program, no matter it's a MPI or non-MPI
>> program, no matter the program can be found or not, as soon as mpiexec
>> is finished, the ssh connection to Korebot will be gone.
>>
>> Thank you
>>
>>
>>> Hi,
>>>   The mpiexec output shows the following error when running hellow,
>>> ==================
>>>
>>> Unable to exec 'hello' on korebot
>>>
>>> Error 2 - No such file or directory
>>>
>>> ==================
>>>
>>>   Please provide the debug output of smpd (smpd -d 2>&1 | tee
>>> smpd.out) along with mpiexec (mpiexec -verbose -n 2 ./hellow 2>&1 |
>>> tee mpiexec.out).
>>>
>>> #  Can you run simple C programs (without using mpiexec) on Korbet ?
>>> #  Is the ssh connection aborted when you run non-MPI programs
>>> (mpiexec -n 2
>>> hostname) ?
>>> #  Can you send us your ".smpd" config file ?
>>> #  Did you modify the MPICH2 code to run on Korbet (Please send us
>>> your configure command & any env settings set to configure/make MPICH2)?
>>>
>>> Regards,
>>> Jayesh
>>>
>>> ________________________________
>>> From: mpich-discuss-bounces at mcs.anl.gov
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Jayesh
>>> Krishna
>>> Sent: Wednesday, February 04, 2009 8:41 AM
>>> To: 'Yu-Cheng Chou'
>>> Cc: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>>>
>>>  Hi,
>>>   I will take a look at the debug logs and get back to you.
>>> Meanwhile, can you run simple C programs without using mpiexec on Korbet
>>> ?
>>>   MPICH2 currently does not support heterogeneous systems (So you
>>> won't be able to run your MPI job across ARM & other architectures).
>>>
>>> Regards,
>>> Jayesh
>>>
>>> -----Original Message-----
>>> From: Yu-Cheng Chou [mailto:cycchou at ucdavis.edu]
>>> Sent: Tuesday, February 03, 2009 7:52 PM
>>> To: Jayesh Krishna
>>> Cc: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] mpiexec kills the remote login shell
>>>
>>>> # Can you run non-MPI programs using mpiexec (mpiexec -n 2 hostname) ?
>>> Yes.
>>>
>>>> # Can you compile and run the hello world program
>>>> (examples/hellow.c) provided with MPICH2 (mpiexec -n 2 ./hellow)?
>>> Yes.
>>>
>>>> # How did you start smpd (the command used to start smpd) ? How did
>>>> you run your MPI job (the command used to run your job)?
>>> I have a ".smpd" file containing one line of information, which is
>>> "phrase=123".
>>> Thus, I started smpd using "smpd -s".
>>> Then I used "mpiexec -n 1 hellow" to run hellow on Korebot.
>>>
>>>> # How did you find that mpiexec kills the sshd process (We typically
>>>> ssh to unix machines and run mpiexec without any problems) ?
>>> I logged in Korebot with two terminals.
>>> >From #1 terminal, I checked all the processes running on Korebot.
>>> >From #2 terminal, I started smpd and run hellow using the commands
>>> mentioned above.
>>> After hellow was finished, the connection to Korebot via #2 terminal
>>> was closed.
>>> >From #1 terminal, I knew that the sshd process associated with #2
>>> >terminal
>>> was gone.
>>>
>>>>  Can you run smpd/mpiexec in debug mode and provide us with the
>>>> outputs (smpd -d / mpiexec -n 2 -verbose hostname) ?
>>> The first attached text file is the output from running hellow in
>>> mpiexec's verbose mode.
>>>
>>>
>>> There is another issue.
>>> This time, I used two machines. One is Korebot as mentioned above,
>>> and the other is a laptop running Ubuntu Linux OS.
>>> I started smpd with the same ".smpd" file and command as mentioned
>>> above both on Korebot and the lap top.
>>> There is a machine file called "hostfile" on Korebot. The file
>>> contains the following information about the name of the two machines.
>>>
>>> korebot
>>> shrimp
>>>
>>> Then from Korebot, I ran cpi using the following command.
>>>
>>> mpiexec -machinefile ./hostfile -verbose -n 2 cpi
>>>
>>>
>>> But the value of pi is a huge number. I think it is related to
>>> "double type variables" being transferred between processes running
>>> on an ARM-based Linux and a general Linux machines.
>>>
>>> The second attached text file is the output from running cpi in
>>> mpiexec's verbose mode.
>>>
>>>
>>>>
>>>> I am cross-compiling mpich2-1.0.8 with smpd for Khepera III mobile
>>>> robot.
>>>>
>>>> This mobile robot has a Korebot board which is an ARM-based computer
>>>> with a Linux operating system.
>>>>
>>>> The cross-compilation was fine.
>>>>
>>>> Firstly, I logged in to Korebot through ssh.
>>>> Secondly, I started smpd.
>>>> Thirdly, I ran mpiexec to execute an MPI program (cpi) that comes
>>>> with the package.
>>>>
>>>> The result was correct, but when mpiexec was finished, the ssh
>>>> connection to the Korebot was closed.
>>>> I found that mpiexec kills the sshd process through which I was
>>>> remotely connected to Korebot.
>>>>
>>>> I've been looking for the cause, but still have not found any clues.
>>>>
>>>> Could you give me any ideas to solve this problem?
>>>>
>>>> Thank you,
>>>>
>>>> Yu-Cheng
>>>>
>>>
>>
>


More information about the mpich-discuss mailing list