[mpich-discuss] Cannot build mpich2-1.0.8p1 (nemesis) with PGI 8.0-4 on Linux x86_64
Dave Goodell
goodell at mcs.anl.gov
Wed Apr 1 16:21:06 CDT 2009
On Apr 1, 2009, at 3:59 PM, Gus Correa wrote:
> Dave Goodell wrote:
>> So I have some bad news, Gus. I managed to locate an installation
>> of PGI-7.1-6 and got MPICH2-1.0.8p1 to compile and link using it
>> with a few tweaks. Unfortunately the binary it generates always
>> segfaults deep down in MPI_Finalize because of some issue related
>> to the inline assembly register constraints. It is possible that
>> this issue could be worked-around in our code or maybe it is
>> resolved by the 8.X version, but I don't have immediate access to it.
>
> Again, I wonder if the PGI developers wouldn't be interested in
> this problem, and would perhaps offer some hints or fixes
> (Maybe fixes need to be applied to their compiler too.)
> After all, a number of their customers are MPICH2 users also,
> and somehow this type of problem isn't happening with other compilers,
> which is not good news for them.
We should have an 8.X PGI compiler here soon-ish. At this point the
problem is an issue of developer bandwidth on our side. I doubt there
are any special problems that arise with PGI, just general portability
stuff that takes time to fix.
>> I'm afraid that I just have to say that 1.0.8* does not work with
>> ch3:nemesis and PGI. We'll make sure to test it and try to get it
>> working for the upcoming 1.1.0 release instead. As an alternative,
>> ch3:sock can be used in 1.0.8.
>
> We have dual-socket quad-core Opteron processor nodes (8 cores/node).
> I am afraid ch3:sock may not be the best choice for this type of
> "fat" node, where shared memory shortcuts (memcpy ?)
> may work better than sockets.
> Is this a wrong perception?
It's true that ch3:nemesis, should you be able to make it work,
provides much better performance for intranode communication than
ch3:sock does. However, there's simply a limit to the amount of work
we can afford to do on older versions of nemesis. In the 1.0.x series
nemesis is still an experimental channel to a certain degree and looks
fairly different from the nemesis in the 1.1.x branch, where it is the
default. Also, in both cases (ch3:sock/ch3:nemesis) you are using tcp
for the internode communication so there is only a small chance for
noticeable speedup from using nemesis in most applications.
-Dave
More information about the mpich-discuss
mailing list