[MPICH] MPICH2 and TotalView - problems in build and how to make it work...

Peter Thompson peter.thompson at etnus.com
Thu Dec 21 10:36:58 CST 2006


We've had a number of request calls into the Etnus support group
regarding getting TotalView working with MPICH2.  I've got a few queries
of my own, but also a general explanation and a approach to a
resolution.  We also have a few patches to the build which I won't post
here, as I'm sure there's a more appropriate place, right?

In general, to get MPICH2 and TotalView cooperating, you need to run
configure with the --enable-debuginfo and --enable-totalview options.
The first allows message queue support and the latter allows general
debug support.  Now there is a dependency on certain python levels and
modules that I don't quite follow, so if someone has an explanation, I'd
be glad to hear it.  What I do know is that you need at least python 2.2
and sometimes even that is not enough.  The check also looks for cPickle
support.  If it doesn't find that it keeps looking, and that is why it
fails the python check at times. If someone knows the particular details
of the dependency here, I'd be glad to hear it.  I suspect it is needed
by mtv.so, which is the library that TotalView relies upon for process
acquisition.

It's in the building of mtv.so that we run into other problems.  From
mtv_setup.py we get

     from distutils.core import setup, Extension

If distutils.core is missing (at least one build failed to locate this
in the configure stage) we stop here.  If it is there, we pickup a
define for OPT from one of the python Makefiles.  On my system, with
python 2.3 I see the following:

/usr/lib/python2.3/config/Makefile

where OPT is defined

# Compiler options
OPT=        -DNDEBUG -O2 -g -pipe -m32 -march=i386 -mtune=pentium4
-D_GNU_SOURCE -fPIC
BASECFLAGS=     -fno-strict-aliasing
CFLAGS=        $(BASECFLAGS) $(OPT)

These are exactly the options that are passed on to the build of mtv.so.
  And this is WRONG!  Depending on the underlying compiler, this might
work, and then again it might not.  More often it doesn't work because
some things have been optimized away with -O2.  So, what I've been
suggesting to people running into this is to make sure they make a copy
of the make process, check for the build of mtv.so, and then before
doing make install, rebuild mtv.so with basically the same options but
to make sure it contains -g and remove the -O2 option.

And that seems to work.  TotalView picks up the processes started with
MPI_Init (we're working on MPI_Spawn support, but it's not there just
yet.) and we even see message queues, or some of them.  What I don't
know, and would appreciate help on, is how to fix the build so we don't
pick up the -O2 in the first place.  I've tried pre-defining OPT, but
the above import wipes out my attempt to define it myself.  I don't
understand python well enough to understand the dependency or why the
distutils.core import is needed to build mtv.so.  So if someone has any
ideas, suggestions, etc, we're certainly willing to listen.  I've tested
with both 1.0.4p1, and 1.0.5 and both versions seem to run into this
problem.

The patches that we do have available (where to send?) are mainly
involved with the make process itself, rather than any MPICH2 code.
They just ensure that mtv.so builds and builds with debug information,
but it doesn't get around this issue with -O2.

Regards,
PeterT




More information about the mpich-discuss mailing list