[petsc-users] valgrind with petscmpiexec

Satish Balay balay at mcs.anl.gov
Tue Dec 15 10:33:33 CST 2020


For one - I think using '--log-file=valgrind-%q{HOSTNAME}-%p.log' might help [to keep the logs from each process separate]

And I think the TMPDIR recommendation is to have a different value for each of the nodes [where the "pid" clash comes from] and perhaps "TMPDIR=/tmp" might work - as this would be local disk on each node [vs /var/tmp/ - which is probably a shared TMP across nodes]

But then - PBS or this MPI requires a shared TMP?

Satish

On Tue, 15 Dec 2020, Yaqi Wang wrote:

> Fande,
> 
> Did you try set TMPDIR for valgrind?
> 
> Sent from my iPhone
> 
> > On Dec 15, 2020, at 1:23 AM, Barry Smith <bsmith at petsc.dev> wrote:
> > 
> > 
> >   No idea. Perhaps petscmpiexec could be modified so it only ran valgrind on the first 10 ranks? Not clear how to do that. Or valgrind should get a MR that removes this small arbitrary limitation on the number of processes. 576 is so 2000 :-)
> > 
> > 
> >   Barry
> > 
> > 
> >> On Dec 14, 2020, at 11:59 PM, Fande Kong <fdkong.jd at gmail.com> wrote:
> >> 
> >> Hi All,
> >> 
> >> I tried to use valgrind to check if the simulation is valgrind clean because I saw some random communication fails during the simulation.
> >> 
> >> I tried this command-line
> >> 
> >> petscmpiexec -valgrind -n 576  ../../../moose-app-oprof  -i input.i -log_view -snes_view
> >> 
> >> 
> >> But I got the following error messages:
> >> 
> >> valgrind: Unable to start up properly.  Giving up.
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8c3fabf2
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8cac2243
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_da8d30c0
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_877871f9
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_c098953e
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_aa649f9f
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_097498ec
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_bfc534b5
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_7604c74a
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_a1fd96bb
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8
> >> valgrind: Startup or configuration error:
> >> valgrind:    Can't create client cmdline file in /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8
> >> valgrind: Unable to start up properly.  Giving up.
> >> ==75596== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_bc5492bb
> >> ==75596== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8
> >> valgrind: Startup or configuration error:
> >> valgrind:    Can't create client cmdline file in /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8
> >> valgrind: Unable to start up properly.  Giving up.
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_b036bdf2
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_105acc43
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_9fb792c0
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_30602bf9
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_21eec73e
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_0b53e99f
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_73e31aec
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_486e8eb5
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_db8c194a
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_839780bb
> >> 
> >> 
> >> I did a bit search online, and found something related https://stackoverflow.com/questions/13707211/what-causes-mkstemp-to-fail-when-running-many-simultaneous-valgrind-processes
> >> 
> >> But do not know what is the right way to fix the issue.
> >> 
> >> Thanks so much,
> >> 
> >> Fande,
> >> 
> > 
> 



More information about the petsc-users mailing list