[mpich-discuss] Windows/Linux MPICH2

Jayesh Krishna jayesh at mcs.anl.gov
Fri Mar 19 15:37:27 CDT 2010


 The error message shows that MPI process 1 exited abnormally. This could be a bug within MPICH2 or in the application. Do you have a sample test code that we can run here at the lab ?

(PS: I would recommend trying to run your application with tools like valgrind or a debugger to find out the source of the problem)
Regards,
Jayesh
----- Original Message -----
From: "Matthew Chambers" <matthew.chambers at vanderbilt.edu>
To: mpich-discuss at mcs.anl.gov
Sent: Friday, March 19, 2010 3:30:40 PM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Windows/Linux MPICH2


Unfortunately I can't launch my jobs from the Windows machine AFAIK. To do so, I'd need support for mapped drives via different user credentials, which doesn't seem to be supported by the -map option (the network drive is hosted by the CentOS box and it uses the university domain for authentication, and the university only makes accounts for real people, never for shared accounts). 

I don't have time to debug the 1.3a1 issue myself due to other pressing tasks, but I'll describe the error: 

Fatal error in MPI_Recv: Other MPI error, error stack: 
MPI_Recv(187).............................: MPI_Recv(buf=0xfff5c700, count=1, MPI_INT, src=MPI_ANY_SOURCE, tag=255, MPI_COMM_WORLD, status=0x84b704c) failed 
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait() 
MPIDI_CH3I_Progress_handle_sock_event(420): 
MPIDU_Socki_handle_read(651)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer) 

job aborted: 
rank: node: exit code[: error message] 
0: jeltz: -2: Fatal error in MPI_Recv: Other MPI error, error stack: 
MPI_Recv(187).............................: MPI_Recv(buf=0xfff5c700, count=1, MPI_INT, src=MPI_ANY_SOURCE, tag=255, MPI_COMM_WORLD, status=0x84b704c) failed 
MPIDI_CH3i_Progress_wait(215).............: an error occurred while handling an event returned by MPIDU_Sock_Wait() 
MPIDI_CH3I_Progress_handle_sock_event(420): 
MPIDU_Socki_handle_read(651)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer) 
1: deepthought: -1073741819: process 1 exited without calling finalize 

Jeltz is the CentOS node, deepthought is the Windows node. This happens in the middle of the job after quite a bit of MPI work has already been done. 

Thanks for all your help! 
-Matt 


On 3/19/2010 3:20 PM, jayesh at mcs.anl.gov wrote: 

Great ! 
 You are right, the "-register_spn" option works only for windows systems. You can try registering the username/password with mpiexec on the windows machine (mpiexec -register) and launch your jobs from the windows machine.
 Meanwhile, if you are having problems using MPICH2 1.3a1 we would like to hear about it. Please send us more information regarding the crash.

Regards,
Jayesh
----- Original Message -----
From: "Matthew Chambers" <matthew.chambers at vanderbilt.edu> To: mpich-discuss at mcs.anl.gov Sent: Friday, March 19, 2010 3:00:25 PM GMT -06:00 US/Canada Central
Subject: Re: [mpich-discuss] Windows/Linux MPICH2


Eureka! I had forgotten to recompile my program after recompiling MPICH2. Also discovered that it crashes with 1.3a1 but works fine with 1.2.1p1. I am now able to run my application between all three types of nodes. However, I'm still annoyed by having to enter the password for the Windows node. Is there a way to make that passwordless when launching mpiexec from the CentOS box? The register_spn stuff would seem to only apply to Windows. 

Thanks! 
-Matt 


On 3/19/2010 2:27 PM, Jayesh Krishna wrote: 

Hi,
 Good to know cpi is working.

# Did you recompile your target application after recompiling MPICH2 (with ch3:sock) ?
# Can you try running simple MPI programs (smallest possible) to narrow down on the problem ?

Regards,
Jayesh 
_______________________________________________
mpich-discuss mailing list mpich-discuss at mcs.anl.gov https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss 
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list