[mpich-discuss] mvapich2 on multiple nodes: 2 problems

abc def cannonjunk at hotmail.co.uk
Fri Apr 23 05:16:43 CDT 2010


Thank you for the suggestion about the channel - I checked the logs and installation was indeed performed with shm. I shall reinstall with nemesis when the 2 computers become free again later today.

Meanwhile there is the 2nd of the two problems, that it's not working at all on the 3rd of the 3 computers. I downloaded the latest version of mpich and recompiled it using the following configuration:

configure --prefix=/usr/local/mpich --with-device=ch3:nemesis --enable-sharedlibs=gcc

(I don't know if the sharedlibs is necessary - I'm just following something I found on the internet)

I then run mpd and then do: mpiexec -n 1 /bin/hostname

but it just hangs. When I do cntl-C, it says "mpiexec_november (mpiexec 440): mpiexec: failed to obtain sock from manager" where november is the name of this 3rd computer. Just running "/bin/hostname" works fine.

Again, despite searching on the internet I've been unable to fix this. It's strange because it had been working a month ago or so.

Any light you may be able to help shed on this problem is very much appreciated. I thought it might be linked to the shm/nemesis problem, but re-installation doesn't seem to have helped.

Thank you.

James

> From: goodell at mcs.anl.gov
> To: mpich-discuss at mcs.anl.gov
> Date: Thu, 22 Apr 2010 10:11:58 -0500
> Subject: Re: [mpich-discuss] mvapich2 on multiple nodes: 2 problems
> 
> On Apr 22, 2010, at 2:59 AM, abc def wrote:
> 
> > DK, Thanks for the explanation about mvapich and mpich - I've  
> > checked and I'm definitely using mpich.
> 
> Then this is the right list for support.
> 
> > Bill, I have now ensured that the directories are all the same on  
> > both computers, and these directories contain the same files, but  
> > they're not linked by nfs - is this necessary? (I'm hoping not,  
> > because setting up nfs is somewhat beyond my skills!)
> 
> No, NFS isn't necessary.  It just makes it easier to avoid  
> accidentally running mismatched copies of the binaries.
> 
> > Just a reminder about this specific problem:
> > mpiexec -n 8 /home/me/software.ex
> >
> > produces the following error:
> > MPIR_Init_thread(310): Initialization failed
> > MPID_Init(113).......: channel initialization failed
> > MPIDI_CH3_Init(244)..: process not on the same host (quad !=  
> > december)Fatal error in MPI_Init: Other MPI error, error stack:
> >
> > And running:
> > mpirun_rsh -hostfile ./machinefile -n 8 /home/me/software.ex >  
> > job.out 2> job.err
> >
> > produces the same error.
> 
> This error happens because you are using the ch3:shm channel.  This  
> channel is deprecated, please don't use it unless you know that you  
> specifically need to.  The shm channel only communicates over shared  
> memory and does not have any network capability.
> 
> You should probably use the default channel instead.  In the old  
> version of MPICH2 you are using (1.0.8p1 maybe?) that is ch3:sock.  In  
> a more recent stable version that will be ch3:nemesis.
> 
> -Dave
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
 		 	   		  
_________________________________________________________________
http://clk.atdmt.com/UKM/go/197222280/direct/01/
Do you have a story that started on Hotmail? Tell us now
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100423/8f9426cc/attachment.htm>


More information about the mpich-discuss mailing list