[petsc-users] How to understand these error messages

Tue Jun 25 15:15:18 CDT 2013

On Jun 25, 2013, at 5:09 AM, Fande Kong <fd.kong at siat.ac.cn> wrote:

> Hi Barry,
> 
> If I use the intel mpi, my code can correctly run and can produce some correct results. Yes, you are right. The IBM MPI has some bugs.
> 
    Thanks for letting us know. 

   Barry

> Thank you for your help.
> 
> Regards,
> 
> On Tue, Jun 25, 2013 at 11:08 AM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Tue, 25 Jun 2013, Fande Kong wrote:
> 
> > Hi Barry,
> >
> > How to use valgrind to debug parallel program on the supercomputer with
> > many cores? If we follow the instruction "mpiexec -n NPROC valgrind
> > --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p
> > PETSCPROGRAMNAME -malloc off PROGRAMOPTIONS",  for 10000 cores, 10000 files
> > will be printed. Maybe we need to put all information into a single file.
> > How to do this?
> 
> For this many cores - the PIDs across nodes won't be unique. It might
> map over to say 1000 files - so I suggest [assuming $HOSTNAME is set
> on each host]
> 
> --log-file=valgrind.log.%q{HOSTNAME}.%p
> 
> You don't want to be mixing output from all the cores - then it would
> be unreadable.
> 
> But if your filesystem cannot handle these many files - you could try
> consolidating output per node as:
> 
> --log-file=valgrind.log.%q{HOSTNAME}
> 
> [or perhaps create a subdir per node or something - and stash files in
> these dirs]
> 
> for each hostname: mkdir -p ${HOSTNAME}
> 
> --log-file=%q{HOSTNAME}/valgrind.log.%p
> 
> 
> Satish
> 
> 
> 
> 
> -- 
> Fande Kong
> ShenZhen Institutes of Advanced Technology
> Chinese Academy of Sciences