<div dir="ltr"><div dir="ltr"><div dir="ltr">Thanks so much, Satish,<div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Dec 15, 2020 at 9:33 AM Satish Balay via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">For one - I think using '--log-file=valgrind-%q{HOSTNAME}-%p.log' might help [to keep the logs from each process separate]<br>
<br>
And I think the TMPDIR recommendation is to have a different value for each of the nodes [where the "pid" clash comes from] and perhaps "TMPDIR=/tmp" might work</blockquote><div><br></div><div>"TMPDIR=/tmp" worked out.<br></div><div><br></div><div><br></div><div>Fande</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> - as this would be local disk on each node [vs /var/tmp/ - which is probably a shared TMP across nodes]<br>
<br>
But then - PBS or this MPI requires a shared TMP?<br>
<br>
Satish<br>
<br>
On Tue, 15 Dec 2020, Yaqi Wang wrote:<br>
<br>
> Fande,<br>
> <br>
> Did you try set TMPDIR for valgrind?<br>
> <br>
> Sent from my iPhone<br>
> <br>
> > On Dec 15, 2020, at 1:23 AM, Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<br>
> > <br>
> > <br>
> >   No idea. Perhaps petscmpiexec could be modified so it only ran valgrind on the first 10 ranks? Not clear how to do that. Or valgrind should get a MR that removes this small arbitrary limitation on the number of processes. 576 is so 2000 :-)<br>
> > <br>
> > <br>
> >   Barry<br>
> > <br>
> > <br>
> >> On Dec 14, 2020, at 11:59 PM, Fande Kong <<a href="mailto:fdkong.jd@gmail.com" target="_blank">fdkong.jd@gmail.com</a>> wrote:<br>
> >> <br>
> >> Hi All,<br>
> >> <br>
> >> I tried to use valgrind to check if the simulation is valgrind clean because I saw some random communication fails during the simulation.<br>
> >> <br>
> >> I tried this command-line<br>
> >> <br>
> >> petscmpiexec -valgrind -n 576  ../../../moose-app-oprof  -i input.i -log_view -snes_view<br>
> >> <br>
> >> <br>
> >> But I got the following error messages:<br>
> >> <br>
> >> valgrind: Unable to start up properly.  Giving up.<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8c3fabf2<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8cac2243<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_da8d30c0<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_877871f9<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_c098953e<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_aa649f9f<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_097498ec<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_bfc534b5<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_7604c74a<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_a1fd96bb<br>
> >> ==75586== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8<br>
> >> valgrind: Startup or configuration error:<br>
> >> valgrind:    Can't create client cmdline file in /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8<br>
> >> valgrind: Unable to start up properly.  Giving up.<br>
> >> ==75596== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_bc5492bb<br>
> >> ==75596== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8<br>
> >> valgrind: Startup or configuration error:<br>
> >> valgrind:    Can't create client cmdline file in /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8<br>
> >> valgrind: Unable to start up properly.  Giving up.<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_b036bdf2<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_105acc43<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_9fb792c0<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_30602bf9<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_21eec73e<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_0b53e99f<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_73e31aec<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_486e8eb5<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_db8c194a<br>
> >> ==75597== VG_(mkstemp): failed to create temp file: /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_839780bb<br>
> >> <br>
> >> <br>
> >> I did a bit search online, and found something related <a href="https://stackoverflow.com/questions/13707211/what-causes-mkstemp-to-fail-when-running-many-simultaneous-valgrind-processes" rel="noreferrer" target="_blank">https://stackoverflow.com/questions/13707211/what-causes-mkstemp-to-fail-when-running-many-simultaneous-valgrind-processes</a><br>
> >> <br>
> >> But do not know what is the right way to fix the issue.<br>
> >> <br>
> >> Thanks so much,<br>
> >> <br>
> >> Fande,<br>
> >> <br>
> > <br>
> <br>
<br>
</blockquote></div></div></div>