[petsc-users] How to understand these error messages
Barry Smith
bsmith at mcs.anl.gov
Tue Jun 25 15:15:18 CDT 2013
On Jun 25, 2013, at 5:09 AM, Fande Kong <fd.kong at siat.ac.cn> wrote:
> Hi Barry,
>
> If I use the intel mpi, my code can correctly run and can produce some correct results. Yes, you are right. The IBM MPI has some bugs.
>
Thanks for letting us know.
Barry
> Thank you for your help.
>
> Regards,
>
> On Tue, Jun 25, 2013 at 11:08 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Tue, 25 Jun 2013, Fande Kong wrote:
>
> > Hi Barry,
> >
> > How to use valgrind to debug parallel program on the supercomputer with
> > many cores? If we follow the instruction "mpiexec -n NPROC valgrind
> > --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p
> > PETSCPROGRAMNAME -malloc off PROGRAMOPTIONS", for 10000 cores, 10000 files
> > will be printed. Maybe we need to put all information into a single file.
> > How to do this?
>
> For this many cores - the PIDs across nodes won't be unique. It might
> map over to say 1000 files - so I suggest [assuming $HOSTNAME is set
> on each host]
>
> --log-file=valgrind.log.%q{HOSTNAME}.%p
>
> You don't want to be mixing output from all the cores - then it would
> be unreadable.
>
> But if your filesystem cannot handle these many files - you could try
> consolidating output per node as:
>
> --log-file=valgrind.log.%q{HOSTNAME}
>
> [or perhaps create a subdir per node or something - and stash files in
> these dirs]
>
> for each hostname: mkdir -p ${HOSTNAME}
>
> --log-file=%q{HOSTNAME}/valgrind.log.%p
>
>
> Satish
>
>
>
>
> --
> Fande Kong
> ShenZhen Institutes of Advanced Technology
> Chinese Academy of Sciences
More information about the petsc-users
mailing list