[MPICH] MPICH2 performance tuning and characterising
stephen mulcahy
smulcahy at aplpi.com
Tue Mar 20 05:21:06 CDT 2007
Hi Anthony,
Apologies for the delayed response - investigations were interrupted by
Patricks day holidays :)
See my response below,
Anthony Chan wrote:
>> I've built and installed mpich2 1.0.5p3 with those options enabled and
>>
>> find /usr/local -name *mpe*
>
> Your find command does not show if jumpshot(i.e. slog2sdk) has been built.
> Jumpshot requires java from SUN or IBM to be able to run smoothly.
> Just to be sure, can you send us the configure and make outputs as seen
> on the screen so to make sure that you have all pieces built, i.e.
>
> cd <mpich2-build-dir>
>
> for csh like shell:
> <mpich2...>/configure .... |& tee c.txt
> make |& tee m.txt
>
> for bourne like shell, remove "&" but add "2>&1" before "|".
jumpshot wasn't installed because I don't have a jre installed on this
particular system (I can install one if neccesary). See attached c.txt
and m.txt files.
> There are 2 ways to verify if MPE logging has been built correctly.
> 1) let's try it on some simpler program, e.g. cpilog.c, in
> /usr/local/share/example_logging. Can you compile cpilog by "..../mpicc
> -mpe=mpilog" and run it and you should see the following message from
> rank 0 in your stdout:
>
> Writing logfile....
> Enabling the Default clock synchronization...
> Finished writing logfile cpilog.clog2
>
> both mpicc and mpif90 should be from the same mpich2 install directory.
smulcahy at titan:~$ ./cpilog
Process 0 running on titan
pi is approximately 3.1415926535897643, Error is 0.0000000000000289
wall clock time = 0.063363
Writing logfile....
Enabling the Default clock synchronization...
Finished writing logfile ./cpilog.clog2.
smulcahy at titan:~$ ./fpilog
Process 0 of 1 is alive
event IDs are 600 601 , 602 603 ,
5000 5001 , 604 605
The number of intervals = 1000000
pi is approximately: 3.1415926535897640 Error is: 0.0000000000000289
pi is approximately: 3.1415926535897640 Error is: 0.0000000000000289
pi is approximately: 3.1415926535897640 Error is: 0.0000000000000289
pi is approximately: 3.1415926535897640 Error is: 0.0000000000000289
pi is approximately: 3.1415926535897640 Error is: 0.0000000000000289
Writing logfile....
Enabling the Default clock synchronization...
Finished writing logfile Unknown.clog2.
So logging does seem to be compiled in - but for some reason the MPI
program I'm using does not seem to use it. I have verified that we're
using the mpirun/mpiexec command from the latest mpich2 install so the
logging should be enabled in that.
>
> 2) You can run "make installcheck" after "make install". The output of
> installcheck will show if various pieces of mpe2 are built/installed
> correctly.
titan:/var/root/mpi/mpich2-1.0.5p3# make installcheck
for dir in src/util/thread src/env src/binding/f77 src/binding/f90
src/binding/cxx src/pm/mpd src/mpe2 - ; do \
if [ "$dir" = "-" ] ; then break ; fi ; \
(cd $dir && make installcheck ; ) ; done
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/util/thread'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/util/thread'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/env'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/env'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f77'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f77'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f90'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/f90'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/cxx'
make[1]: Nothing to be done for `installcheck'.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/binding/cxx'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/pm/mpd'
make[1]: *** No rule to make target `installcheck'. Stop.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/pm/mpd'
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
Running installation linktest for C logging program...
*** Link C program with the MPI tracing library
.......................... Yes.
*** Link C program with the MPI logging library
.......................... Yes.
*** Link C program with the MPI and manual logging libraries
............. Yes.
Running installation linktest for Fortran logging program...
*** Link F77 program with the MPI and manual logging libraries
........... Yes.
Running installation linktest for C collchk program...
*** Link C program with the MPI collective/datatype checking library
..... No.
The failed command is :
pgcc wrong_int_byte.c -o wrong_int_byte
/tmp/pgcc9YVd36VvLW-U8cgw.o: In function `main':
wrong_int_byte.c:(.text+0x1f): undefined reference to `MPI_Init'
wrong_int_byte.c:(.text+0x2d): undefined reference to `MPI_Comm_rank'
wrong_int_byte.c:(.text+0x3b): undefined reference to `MPI_Comm_size'
wrong_int_byte.c:(.text+0x60): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x87): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x8c): undefined reference to `MPI_Finalize'
make[3]: *** [wrong_int_byte] Error 2
Running installation linktest for Fortran collchk program...
*** Link F77 program with the MPI collective/datatype checking library
... Yes.
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
make installcheck-postamble
make[1]: Entering directory `/var/root/mpi/mpich2-1.0.5p3'
make[2]: Entering directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
Running installation runtest for C logging program...
*** Test C program with the MPI tracing library
.......................... Yes.
*** Test C program with the MPI logging library
.......................... No.
cpi_log.clog2 is not generated.
*** Test C program with the MPI and manual logging libraries
............. No.
cpilog.clog2 is not generated.
Running installation runtest for Fortran logging program...
*** Test F77 program with the MPI and manual logging libraries
........... No.
fpilog.clog2 is not generated.
Running installation runtest for C collchk program...
*** Test C program with the MPI collective/datatype checking library
..... No.
The failed command is :
pgcc wrong_int_byte.c -o wrong_int_byte
/tmp/pgcc2pYdI0-6MDF5Hl3f.o: In function `main':
wrong_int_byte.c:(.text+0x1f): undefined reference to `MPI_Init'
wrong_int_byte.c:(.text+0x2d): undefined reference to `MPI_Comm_rank'
wrong_int_byte.c:(.text+0x3b): undefined reference to `MPI_Comm_size'
wrong_int_byte.c:(.text+0x60): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x87): undefined reference to `MPI_Bcast'
wrong_int_byte.c:(.text+0x8c): undefined reference to `MPI_Finalize'
make[4]: *** [wrong_int_byte] Error 2
Running installation runtest for Fortran collchk program...
*** Test F77 program with the MPI collective/datatype checking library
... Yes.
make[2]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3/src/mpe2'
make[1]: Leaving directory `/var/root/mpi/mpich2-1.0.5p3'
There are certainly errors here which seem to be related to installcheck
not finding libraries installed in /usr/local/lib? But the fact that
cpilog and fpilog are compiled and running ok suggests to me that this
is a red herring, what do you think?
>> Is there some additional step required or do I also need to add logging
>> code to our app before I can see any log-files?
>
> There is user-defined MPE logging you can add to your code to supplement
> MPI logging, check mpich.../src/mpe2/README.
I'll hold off on user-defined MPE logging until I get the basics running.
Thanks,
-stephen
--
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
GMIT, Dublin Rd, Galway, Ireland. http://www.aplpi.com
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: c.txt
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070320/b4711e46/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: m.txt
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070320/b4711e46/attachment-0001.txt>
More information about the mpich-discuss
mailing list