<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
Having now had a chance to reinstall mpich2 on the first 2 computers, I have tested the program trying to use the 4 cores of both computers together, using:<br><br>mpiexec -n 8 /home/me/software.ex > job.out 2> job.err<br>and<br>mpiexec -machinefile ./machinefile -n 8 /home/me/software.ex >
job.out 2> job.err<br><br>but I get the same error again:<br>Fatal error in MPI_Init: Other MPI error, error stack:<br>MPIR_Init_thread(310): Initialization failed<br>MPID_Init(113).......: channel initialization failed<br>MPIDI_CH3_Init(244)..: process not on the same host (quad != december)<br>
<br>This is despite running configure with<br><br>configure --prefix=/usr/local/mpich --with-device=ch3:nemesis
--enable-sharedlibs=gcc<br><br>Running just "mpiexec -n 8 /bin/hostname" is fine though.<br>Thanks.<br><br><hr id="stopSpelling">From: cannonjunk@hotmail.co.uk<br>To: mpich-discuss@mcs.anl.gov<br>Date: Fri, 23 Apr 2010 11:16:43 +0100<br>Subject: Re: [mpich-discuss] mvapich2 on multiple nodes: 2 problems<br><br>
<style>
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Verdana;}
</style>
Thank you for the suggestion about the channel - I checked the logs and installation was indeed performed with shm. I shall reinstall with nemesis when the 2 computers become free again later today.<br><br>Meanwhile there is the 2nd of the two problems, that it's not working at all on the 3rd of the 3 computers. I downloaded the latest version of mpich and recompiled it using the following configuration:<br><br>configure --prefix=/usr/local/mpich --with-device=ch3:nemesis --enable-sharedlibs=gcc<br><br>(I don't know if the sharedlibs is necessary - I'm just following something I found on the internet)<br><br>I then run mpd and then do: mpiexec -n 1 /bin/hostname<br><br>but it just hangs. When I do cntl-C, it says "mpiexec_november (mpiexec 440): mpiexec: failed to obtain sock from manager" where november is the name of this 3rd computer. Just running "/bin/hostname" works fine.<br><br>Again, despite searching on the internet I've been unable to fix this. It's strange because it had been working a month ago or so.<br><br>Any light you may be able to help shed on this problem is very much appreciated. I thought it might be linked to the shm/nemesis problem, but re-installation doesn't seem to have helped.<br><br>Thank you.<br><br>James<br><br>> From: goodell@mcs.anl.gov<br>> To: mpich-discuss@mcs.anl.gov<br>> Date: Thu, 22 Apr 2010 10:11:58 -0500<br>> Subject: Re: [mpich-discuss] mvapich2 on multiple nodes: 2 problems<br>> <br>> On Apr 22, 2010, at 2:59 AM, abc def wrote:<br>> <br>> > DK, Thanks for the explanation about mvapich and mpich - I've <br>> > checked and I'm definitely using mpich.<br>> <br>> Then this is the right list for support.<br>> <br>> > Bill, I have now ensured that the directories are all the same on <br>> > both computers, and these directories contain the same files, but <br>> > they're not linked by nfs - is this necessary? (I'm hoping not, <br>> > because setting up nfs is somewhat beyond my skills!)<br>> <br>> No, NFS isn't necessary. It just makes it easier to avoid <br>> accidentally running mismatched copies of the binaries.<br>> <br>> > Just a reminder about this specific problem:<br>> > mpiexec -n 8 /home/me/software.ex<br>> ><br>> > produces the following error:<br>> > MPIR_Init_thread(310): Initialization failed<br>> > MPID_Init(113).......: channel initialization failed<br>> > MPIDI_CH3_Init(244)..: process not on the same host (quad != <br>> > december)Fatal error in MPI_Init: Other MPI error, error stack:<br>> ><br>> > And running:<br>> > mpirun_rsh -hostfile ./machinefile -n 8 /home/me/software.ex > <br>> > job.out 2> job.err<br>> ><br>> > produces the same error.<br>> <br>> This error happens because you are using the ch3:shm channel. This <br>> channel is deprecated, please don't use it unless you know that you <br>> specifically need to. The shm channel only communicates over shared <br>> memory and does not have any network capability.<br>> <br>> You should probably use the default channel instead. In the old <br>> version of MPICH2 you are using (1.0.8p1 maybe?) that is ch3:sock. In <br>> a more recent stable version that will be ch3:nemesis.<br>> <br>> -Dave<br>> <br>> _______________________________________________<br>> mpich-discuss mailing list<br>> mpich-discuss@mcs.anl.gov<br>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>                                            <br><hr>Get a new e-mail account with Hotmail - Free. <a href="http://clk.atdmt.com/UKM/go/197222280/direct/01/">Sign-up now.</a>                                            <br /><hr />Get a free e-mail account with Hotmail. <a href='http://clk.atdmt.com/UKM/go/197222280/direct/01/' target='_new'>Sign-up now.</a></body>
</html>