[petsc-users] valgrind with petscmpiexec
Fande Kong
fdkong.jd at gmail.com
Tue Dec 15 10:54:36 CST 2020
Thanks so much, Satish,
On Tue, Dec 15, 2020 at 9:33 AM Satish Balay via petsc-users <
petsc-users at mcs.anl.gov> wrote:
> For one - I think using '--log-file=valgrind-%q{HOSTNAME}-%p.log' might
> help [to keep the logs from each process separate]
>
> And I think the TMPDIR recommendation is to have a different value for
> each of the nodes [where the "pid" clash comes from] and perhaps
> "TMPDIR=/tmp" might work
"TMPDIR=/tmp" worked out.
Fande
> - as this would be local disk on each node [vs /var/tmp/ - which is
> probably a shared TMP across nodes]
>
> But then - PBS or this MPI requires a shared TMP?
>
> Satish
>
> On Tue, 15 Dec 2020, Yaqi Wang wrote:
>
> > Fande,
> >
> > Did you try set TMPDIR for valgrind?
> >
> > Sent from my iPhone
> >
> > > On Dec 15, 2020, at 1:23 AM, Barry Smith <bsmith at petsc.dev> wrote:
> > >
> > >
> > > No idea. Perhaps petscmpiexec could be modified so it only ran
> valgrind on the first 10 ranks? Not clear how to do that. Or valgrind
> should get a MR that removes this small arbitrary limitation on the number
> of processes. 576 is so 2000 :-)
> > >
> > >
> > > Barry
> > >
> > >
> > >> On Dec 14, 2020, at 11:59 PM, Fande Kong <fdkong.jd at gmail.com> wrote:
> > >>
> > >> Hi All,
> > >>
> > >> I tried to use valgrind to check if the simulation is valgrind clean
> because I saw some random communication fails during the simulation.
> > >>
> > >> I tried this command-line
> > >>
> > >> petscmpiexec -valgrind -n 576 ../../../moose-app-oprof -i input.i
> -log_view -snes_view
> > >>
> > >>
> > >> But I got the following error messages:
> > >>
> > >> valgrind: Unable to start up properly. Giving up.
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8c3fabf2
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_8cac2243
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_da8d30c0
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_877871f9
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_c098953e
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_aa649f9f
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_097498ec
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_bfc534b5
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_7604c74a
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_a1fd96bb
> > >> ==75586== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8
> > >> valgrind: Startup or configuration error:
> > >> valgrind: Can't create client cmdline file in
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75586_cmdline_4c8857d8
> > >> valgrind: Unable to start up properly. Giving up.
> > >> ==75596== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_bc5492bb
> > >> ==75596== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8
> > >> valgrind: Startup or configuration error:
> > >> valgrind: Can't create client cmdline file in
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75596_cmdline_ec59a3d8
> > >> valgrind: Unable to start up properly. Giving up.
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_b036bdf2
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_105acc43
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_9fb792c0
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_30602bf9
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_21eec73e
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_0b53e99f
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_73e31aec
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_486e8eb5
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_db8c194a
> > >> ==75597== VG_(mkstemp): failed to create temp file:
> /var/tmp/pbs.3110013.sawtoothpbs/valgrind_proc_75597_cmdline_839780bb
> > >>
> > >>
> > >> I did a bit search online, and found something related
> https://stackoverflow.com/questions/13707211/what-causes-mkstemp-to-fail-when-running-many-simultaneous-valgrind-processes
> > >>
> > >> But do not know what is the right way to fix the issue.
> > >>
> > >> Thanks so much,
> > >>
> > >> Fande,
> > >>
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201215/16fbab54/attachment.html>
More information about the petsc-users
mailing list