[mpich-discuss] mvapich2 on multiple nodes: 2 problems
abc def
cannonjunk at hotmail.co.uk
Fri Apr 23 05:16:43 CDT 2010
Thank you for the suggestion about the channel - I checked the logs and installation was indeed performed with shm. I shall reinstall with nemesis when the 2 computers become free again later today.
Meanwhile there is the 2nd of the two problems, that it's not working at all on the 3rd of the 3 computers. I downloaded the latest version of mpich and recompiled it using the following configuration:
configure --prefix=/usr/local/mpich --with-device=ch3:nemesis --enable-sharedlibs=gcc
(I don't know if the sharedlibs is necessary - I'm just following something I found on the internet)
I then run mpd and then do: mpiexec -n 1 /bin/hostname
but it just hangs. When I do cntl-C, it says "mpiexec_november (mpiexec 440): mpiexec: failed to obtain sock from manager" where november is the name of this 3rd computer. Just running "/bin/hostname" works fine.
Again, despite searching on the internet I've been unable to fix this. It's strange because it had been working a month ago or so.
Any light you may be able to help shed on this problem is very much appreciated. I thought it might be linked to the shm/nemesis problem, but re-installation doesn't seem to have helped.
Thank you.
James
> From: goodell at mcs.anl.gov
> To: mpich-discuss at mcs.anl.gov
> Date: Thu, 22 Apr 2010 10:11:58 -0500
> Subject: Re: [mpich-discuss] mvapich2 on multiple nodes: 2 problems
>
> On Apr 22, 2010, at 2:59 AM, abc def wrote:
>
> > DK, Thanks for the explanation about mvapich and mpich - I've
> > checked and I'm definitely using mpich.
>
> Then this is the right list for support.
>
> > Bill, I have now ensured that the directories are all the same on
> > both computers, and these directories contain the same files, but
> > they're not linked by nfs - is this necessary? (I'm hoping not,
> > because setting up nfs is somewhat beyond my skills!)
>
> No, NFS isn't necessary. It just makes it easier to avoid
> accidentally running mismatched copies of the binaries.
>
> > Just a reminder about this specific problem:
> > mpiexec -n 8 /home/me/software.ex
> >
> > produces the following error:
> > MPIR_Init_thread(310): Initialization failed
> > MPID_Init(113).......: channel initialization failed
> > MPIDI_CH3_Init(244)..: process not on the same host (quad !=
> > december)Fatal error in MPI_Init: Other MPI error, error stack:
> >
> > And running:
> > mpirun_rsh -hostfile ./machinefile -n 8 /home/me/software.ex >
> > job.out 2> job.err
> >
> > produces the same error.
>
> This error happens because you are using the ch3:shm channel. This
> channel is deprecated, please don't use it unless you know that you
> specifically need to. The shm channel only communicates over shared
> memory and does not have any network capability.
>
> You should probably use the default channel instead. In the old
> version of MPICH2 you are using (1.0.8p1 maybe?) that is ch3:sock. In
> a more recent stable version that will be ch3:nemesis.
>
> -Dave
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_________________________________________________________________
http://clk.atdmt.com/UKM/go/197222280/direct/01/
Do you have a story that started on Hotmail? Tell us now
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100423/8f9426cc/attachment.htm>
More information about the mpich-discuss
mailing list