[mpich-discuss] Cannot build mpich2-1.0.8p1 (nemesis) with PGI 8.0-4 on Linux x86_64

Dave Goodell goodell at mcs.anl.gov
Wed Apr 1 16:21:06 CDT 2009


On Apr 1, 2009, at 3:59 PM, Gus Correa wrote:

> Dave Goodell wrote:
>> So I have some bad news, Gus.  I managed to locate an installation  
>> of PGI-7.1-6 and got MPICH2-1.0.8p1 to compile and link using it  
>> with a few tweaks.  Unfortunately the binary it generates always  
>> segfaults deep down in MPI_Finalize because of some issue related  
>> to the inline assembly register constraints.  It is possible that  
>> this issue could be worked-around in our code or maybe it is  
>> resolved by the 8.X version, but I don't have immediate access to it.
>
> Again, I wonder if the PGI developers wouldn't be interested in
> this problem, and would perhaps offer some hints or fixes
> (Maybe fixes need to be applied to their compiler too.)
> After all, a number of their customers are MPICH2 users also,
> and somehow this type of problem isn't happening with other compilers,
> which is not good news for them.

We should have an 8.X PGI compiler here soon-ish.  At this point the  
problem is an issue of developer bandwidth on our side.  I doubt there  
are any special problems that arise with PGI, just general portability  
stuff that takes time to fix.

>> I'm afraid that I just have to say that 1.0.8* does not work with  
>> ch3:nemesis and PGI.  We'll make sure to test it and try to get it  
>> working for the upcoming 1.1.0 release instead.  As an alternative,  
>> ch3:sock can be used in 1.0.8.
>
> We have dual-socket quad-core Opteron processor nodes (8 cores/node).
> I am afraid ch3:sock may not be the best choice for this type of
> "fat" node, where shared memory shortcuts (memcpy ?)
> may work better than sockets.
> Is this a wrong perception?

It's true that ch3:nemesis, should you be able to make it work,  
provides much better performance for intranode communication than  
ch3:sock does.  However, there's simply a limit to the amount of work  
we can afford to do on older versions of nemesis.  In the 1.0.x series  
nemesis is still an experimental channel to a certain degree and looks  
fairly different from the nemesis in the 1.1.x branch, where it is the  
default.  Also, in both cases (ch3:sock/ch3:nemesis) you are using tcp  
for the internode communication so there is only a small chance for  
noticeable speedup from using nemesis in most applications.

-Dave



More information about the mpich-discuss mailing list