[mpich-discuss] Windows/Linux MPICH2

Matthew Chambers matthew.chambers at vanderbilt.edu
Fri Mar 19 15:30:40 CDT 2010


Unfortunately I can't launch my jobs from the Windows machine AFAIK. To 
do so, I'd need support for mapped drives via different user 
credentials, which doesn't seem to be supported by the -map option (the 
network drive is hosted by the CentOS box and it uses the university 
domain for authentication, and the university only makes accounts for 
real people, never for shared accounts).

I don't have time to debug the 1.3a1 issue myself due to other pressing 
tasks, but I'll describe the error:

Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(187).............................: MPI_Recv(buf=0xfff5c700, 
count=1, MPI_INT, src=MPI_ANY_SOURCE, tag=255, MPI_COMM_WORLD, 
status=0x84b704c) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while 
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(651)..............: connection failure 
(set=0,sock=2,errno=104:Connection reset by peer)

job aborted:
rank: node: exit code[: error message]
0: jeltz: -2: Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(187).............................: MPI_Recv(buf=0xfff5c700, 
count=1, MPI_INT, src=MPI_ANY_SOURCE, tag=255, MPI_COMM_WORLD, 
status=0x84b704c) failed
MPIDI_CH3i_Progress_wait(215).............: an error occurred while 
handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(420):
MPIDU_Socki_handle_read(651)..............: connection failure 
(set=0,sock=2,errno=104:Connection reset by peer)
1: deepthought: -1073741819: process 1 exited without calling finalize

Jeltz is the CentOS node, deepthought is the Windows node. This happens 
in the middle of the job after quite a bit of MPI work has already been 
done.

Thanks for all your help!
-Matt


On 3/19/2010 3:20 PM, jayesh at mcs.anl.gov wrote:
>   Great !
>   You are right, the "-register_spn" option works only for windows systems. You can try registering the username/password with mpiexec on the windows machine (mpiexec -register) and launch your jobs from the windows machine.
>   Meanwhile, if you are having problems using MPICH2 1.3a1 we would like to hear about it. Please send us more information regarding the crash.
>
> Regards,
> Jayesh
> ----- Original Message -----
> From: "Matthew Chambers"<matthew.chambers at vanderbilt.edu>
> To: mpich-discuss at mcs.anl.gov
> Sent: Friday, March 19, 2010 3:00:25 PM GMT -06:00 US/Canada Central
> Subject: Re: [mpich-discuss] Windows/Linux MPICH2
>
>
> Eureka! I had forgotten to recompile my program after recompiling MPICH2. Also discovered that it crashes with 1.3a1 but works fine with 1.2.1p1. I am now able to run my application between all three types of nodes. However, I'm still annoyed by having to enter the password for the Windows node. Is there a way to make that passwordless when launching mpiexec from the CentOS box? The register_spn stuff would seem to only apply to Windows.
>
> Thanks!
> -Matt
>
>
> On 3/19/2010 2:27 PM, Jayesh Krishna wrote:
>
> Hi,
>   Good to know cpi is working.
>
> # Did you recompile your target application after recompiling MPICH2 (with ch3:sock) ?
> # Can you try running simple MPI programs (smallest possible) to narrow down on the problem ?
>
> Regards,
> Jayesh
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>    
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100319/a7026605/attachment.htm>


More information about the mpich-discuss mailing list