[petsc-users] How to understand these error messages

Satish Balay balay at mcs.anl.gov
Mon Jun 24 22:08:57 CDT 2013


On Tue, 25 Jun 2013, Fande Kong wrote:

> Hi Barry,
> 
> How to use valgrind to debug parallel program on the supercomputer with
> many cores? If we follow the instruction "mpiexec -n NPROC valgrind
> --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p
> PETSCPROGRAMNAME -malloc off PROGRAMOPTIONS",  for 10000 cores, 10000 files
> will be printed. Maybe we need to put all information into a single file.
> How to do this?

For this many cores - the PIDs across nodes won't be unique. It might
map over to say 1000 files - so I suggest [assuming $HOSTNAME is set
on each host]

--log-file=valgrind.log.%q{HOSTNAME}.%p

You don't want to be mixing output from all the cores - then it would
be unreadable.

But if your filesystem cannot handle these many files - you could try
consolidating output per node as:

--log-file=valgrind.log.%q{HOSTNAME}

[or perhaps create a subdir per node or something - and stash files in
these dirs]

for each hostname: mkdir -p ${HOSTNAME}

--log-file=%q{HOSTNAME}/valgrind.log.%p


Satish


More information about the petsc-users mailing list