[MPICH] MPICH2 performance tuning and characterising

stephen mulcahy smulcahy at aplpi.com
Tue Mar 20 05:21:06 CDT 2007


Hi Anthony,

Apologies for the delayed response - investigations were interrupted by 
Patricks day holidays :)

See my response below,

Anthony Chan wrote:
>> I've built and installed mpich2 1.0.5p3 with those options enabled and
>>
>> find /usr/local -name *mpe*
> 
> Your find command does not show if jumpshot(i.e. slog2sdk) has been built.
> Jumpshot requires java from SUN or IBM to be able to run smoothly.
> Just to be sure, can you send us the configure and make outputs as seen
> on the screen so to make sure that you have all pieces built, i.e.
> 
> cd <mpich2-build-dir>
> 
> for csh like shell:
> <mpich2...>/configure .... |& tee c.txt
> make |& tee m.txt
> 
> for bourne like shell, remove "&" but add "2>&1" before "|".

jumpshot wasn't installed because I don't have a jre installed on this 
particular system (I can install one if neccesary). See attached c.txt 
and m.txt files.



> There are 2 ways to verify if MPE logging has been built correctly.
> 1) let's try it on some simpler program, e.g. cpilog.c, in
> /usr/local/share/example_logging.  Can you compile cpilog by "..../mpicc
> -mpe=mpilog" and run it and you should see the following message from
> rank 0 in your stdout:
> 
> Writing logfile....
> Enabling the Default clock synchronization...
> Finished writing logfile cpilog.clog2
> 
> both mpicc and mpif90 should be from the same mpich2 install directory.

smulcahy at titan:~$ ./cpilog
Process 0 running on titan
pi is approximately 3.1415926535897643, Error is 0.0000000000000289
wall clock time = 0.063363
Writing logfile....
Enabling the Default clock synchronization...
Finished writing logfile ./cpilog.clog2.

smulcahy at titan:~$ ./fpilog
  Process             0  of             1  is alive
  event IDs are           600          601 ,           602          603 ,
          5000         5001 ,           604          605
  The number of intervals =      1000000
   pi is approximately: 3.1415926535897640  Error is: 0.0000000000000289
   pi is approximately: 3.1415926535897640  Error is: 0.0000000000000289
   pi is approximately: 3.1415926535897640  Error is: 0.0000000000000289
   pi is approximately: 3.1415926535897640  Error is: 0.0000000000000289
   pi is approximately: 3.1415926535897640  Error is: 0.0000000000000289
Writing logfile....
Enabling the Default clock synchronization...
Finished writing logfile Unknown.clog2.

So logging does seem to be compiled in - but for some reason the MPI 
program I'm using does not seem to use it. I have verified that we're 
using the mpirun/mpiexec command from the latest mpich2 install so the 
logging should be enabled in that.

> 
> 2) You can run "make installcheck" after "make install".  The output of
> installcheck will show if various pieces of mpe2 are built/installed
> correctly.

titan:/var/root/mpi/mpich2-1.0.5p3# make installcheck
for dir in src/util/thread src/env  src/binding/f77 src/binding/f90 
src/binding/cxx  src/pm/mpd src/mpe2 - ; do \
                 if [ "$dir" = "-" ] ; then break ; fi ; \
                 (cd $dir && make installcheck ; ) ; done
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/util/thread'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/util/thread'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/env'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/env'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f77'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f77'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f90'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f90'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/cxx'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/cxx'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/pm/mpd'
make[1]: *** No rule to make target `installcheck'.  Stop.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/pm/mpd'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
Running installation linktest for C logging program...

*** Link C program with the MPI tracing library 
.......................... Yes.

*** Link C program with the MPI logging library 
.......................... Yes.

*** Link C program with the MPI and manual logging libraries 
............. Yes.

Running installation linktest for Fortran logging program...

*** Link F77 program with the MPI and manual logging libraries 
........... Yes.

Running installation linktest for C collchk program...

*** Link C program with the MPI collective/datatype checking library 
..... No.
     The failed command is :
pgcc     wrong_int_byte.c   -o wrong_int_byte
/tmp/pgcc9YVd36VvLW-U8cgw.o: In function `main':
wrong_int_byte.c:(.text+0x1f): undefined reference to `MPI_Init'
wrong_int_byte.c:(.text+0x2d): undefined reference to `MPI_Comm_rank'
wrong_int_byte.c:(.text+0x3b): undefined reference to `MPI_Comm_size'
wrong_int_byte.c:(.text+0x60): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x87): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x8c): undefined reference to `MPI_Finalize'
make[3]: *** [wrong_int_byte] Error 2

Running installation linktest for Fortran collchk program...

*** Link F77 program with the MPI collective/datatype checking library 
... Yes.

make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
make installcheck-postamble
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3'
make[2]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
Running installation runtest for C logging program...

*** Test C program with the MPI tracing library 
.......................... Yes.

*** Test C program with the MPI logging library 
.......................... No.
    cpi_log.clog2 is not generated.

*** Test C program with the MPI and manual logging libraries 
............. No.
    cpilog.clog2 is not generated.

Running installation runtest for Fortran logging program...

*** Test F77 program with the MPI and manual logging libraries 
........... No.
    fpilog.clog2 is not generated.

Running installation runtest for C collchk program...

*** Test C program with the MPI collective/datatype checking library 
..... No.
     The failed command is :
pgcc     wrong_int_byte.c   -o wrong_int_byte
/tmp/pgcc2pYdI0-6MDF5Hl3f.o: In function `main':
wrong_int_byte.c:(.text+0x1f): undefined reference to `MPI_Init'
wrong_int_byte.c:(.text+0x2d): undefined reference to `MPI_Comm_rank'
wrong_int_byte.c:(.text+0x3b): undefined reference to `MPI_Comm_size'
wrong_int_byte.c:(.text+0x60): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x87): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x8c): undefined reference to `MPI_Finalize'
make[4]: *** [wrong_int_byte] Error 2

Running installation runtest for Fortran collchk program...

*** Test F77 program with the MPI collective/datatype checking library 
... Yes.

make[2]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3'

There are certainly errors here which seem to be related to installcheck 
not finding libraries installed in /usr/local/lib? But the fact that 
cpilog and fpilog are compiled and running ok suggests to me that this 
is a red herring, what do you think?

>> Is there some additional step required or do I also need to add logging
>> code to our app before I can see any log-files?
> 
> There is user-defined MPE logging you can add to your code to supplement
> MPI logging, check mpich.../src/mpe2/README.

I'll hold off on user-defined MPE logging until I get the basics running.

Thanks,

-stephen
-- 
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
    GMIT, Dublin Rd, Galway, Ireland.      http://www.aplpi.com
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: c.txt
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070320/b4711e46/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: m.txt
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070320/b4711e46/attachment-0001.txt>


More information about the mpich-discuss mailing list