[petsc-dev] [petsc-maint #69143] petsc-dev performance issues
Dmitry Karpeev
karpeev at mcs.anl.gov
Tue Apr 5 08:09:35 CDT 2011
On Tue, Apr 5, 2011 at 7:31 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> On Apr 5, 2011, at 1:31 AM, Satish Balay wrote:
>
> > On Mon, 4 Apr 2011, Dmitry Karpeev wrote:
> >
> >> What is the patch level petsc-3.1 being used? The dlopen() code that
> >> generates the
> >> error was added to petsc-dev after the initial release of 3.1.
> >>
> >> Typically (on many systems) dlopen(NULL,dlflags) returns the handle of
> the
> >> main executable, as if it were
> >> a dynamically-loaded library, and dlsym() is then run on that handle.
> This
> >> doesn't work on all systems,
> >> though (e.g., OS X defines a specific handle for the main executable;
> >> enabling that means augmenting
> >> configure). In particular, on this system, dlopen() with a NULL first
> >> argument appears to throw a confusing error.
> >>
> >> The original release of petsc-3.1 would use dlsym(0) in this case,
> skipping
> >> dlopen(NULL,dlflags),
> >> so that is, perhaps, what works on XT5. To revert to that behavior it
> >> should be sufficient
> >> to apply the following patch to petsc-dev/src/sys/dll/dlimpl.c (or
> simply to
> >> remove lines 269 through 308 inclusive).
> >
> > For One - I don't think any of the default PETSc code should be doing
> > dlopen(executalbe) - if --with-dynamic-loading=0 is set [Otherwise why
> > have this option? suppress dlopen(libpetsc.so) - but enable
> > dlopen(executable)?]
> >
> > So I've changed the fix below and disable it with:
> >
> > #if defined(PETSC_HAVE_DLOPEN) && defined(PETSC_USE_DYNAMIC_LIBRARIES)
> >
> > wrt dlsym(0) - we ignore the dlerror() from it - so I guess its ok..
> >
> > This works fine on the cray - so pusing this change.
> >
> > [I guess we still need to fix this for --with-dynamic-loading=1]
>
> I thought about this also, but it is the wrong fix. HAVE_DYNAMIC means
> that dynamic libraries work and the user can pass strings as functions;
> USE_DYNAMIC means make PETSc libraries dynamic and always use strings for
> functions. The problem is only that we do not have the configure tests in
> places for various versions of dlopen() on the executable. The correct fix
> is to make configure tests for dlopen() and change the PETSc code to do the
> various dlopen(0,...) variants based on the configure test. Now we can punt
> on doing it correctly, but we'd be punting only out of lazyness, not because
> it is the right model.
>
> BTW: someone should submit a bug report to Cray (and IBM if it doesn't
> work there also).
>
Okay, but what's going on with petsc-3.1 on that machine? It doesn't fail.
Is that because dlsym(0,string) works?
Or was dynamic loading disabled in it?
Dmitry.
>
> Barry
>
>
> >
> > Satish
> >
> >>
> >> Dmitry.
> >>
> >> @@ -266,46 +266,7 @@
> >> }
> >> else {
> >> dlhandle = (dlhandle_t) 0;
> >> -
> >> -#if defined(PETSC_HAVE_DLOPEN)
> >> - /* Attempt to retrieve the main executable's dlhandle. */
> >> - { int dlflags1 = 0, dlflags2 = 0;
> >> -#if defined(PETSC_HAVE_RTLD_LAZY)
> >> - dlflags1 = RTLD_LAZY;
> >> -#endif
> >> - if(!dlflags1) {
> >> -#if defined(PETSC_HAVE_RTLD_NOW)
> >> - dlflags1 = RTLD_NOW;
> >> -#endif
> >> - }
> >> -#if defined(PETSC_HAVE_RTLD_LOCAL)
> >> - dlflags2 = RTLD_LOCAL;
> >> -#endif
> >> - if(!dlflags2) {
> >> -#if defined(PETSC_HAVE_RTLD_GLOBAL)
> >> - dlflags2 = RTLD_GLOBAL;
> >> -#endif
> >> - }
> >> -#if defined(PETSC_HAVE_DLERROR)
> >> -#if defined(PETSC_HAVE_VALGRIND)
> >> - if (!(RUNNING_ON_VALGRIND)) {
> >> -#endif
> >> - dlerror(); /* clear any previous error; valgrind does not like
> this
> >> */
> >> -#if defined(PETSC_HAVE_VALGRIND)
> >> - }
> >> -#endif
> >> -#endif
> >> - /* Attempt to open the main executable as a dynamic library. */
> >> - dlhandle = dlopen(0, dlflags1|dlflags2);
> >> - }
> >> -#if defined(PETSC_HAVE_DLERROR)
> >> - { const char *e = (const char*) dlerror();
> >> - if(e){
> >> - SETERRQ1(PETSC_COMM_SELF, PETSC_ERR_ARG_WRONG, "Error opening
> main
> >> executable as a dynamic library:\n Error message from dlopen():
> '%s'\n",
> >> e);
> >> - }
> >> - }
> >> -#endif
> >> -#endif /* PETSC_HAVE_DLOPEN */
> >> +
> >> }
> >> #if defined(PETSC_HAVE_DLERROR)
> >> dlerror(); /* clear any previous error */
> >>
> >>
> >>
> >> On Mon, Apr 4, 2011 at 7:26 PM, Matthew Knepley <
> petsc-maint at mcs.anl.gov>wrote:
> >>
> >>> Now it is really hard for me to understand what the problem is since
> both
> >>> 3.1 and dev check for this
> >>> function in the same way. Moreover, it does not depend on shared
> libraries.
> >>>
> >>> Satish, have you seen this error before on the XT5?
> >>>
> >>> Matt
> >>>
> >>> On Mon, Apr 4, 2011 at 6:45 PM, Satish Balay <petsc-maint at mcs.anl.gov
> >>>> wrote:
> >>>
> >>>> On Mon, 4 Apr 2011, Sebastian Steiger wrote:
> >>>>
> >>>>> On 04/04/2011 05:42 PM, Satish Balay wrote:
> >>>>>> Could you run both the binaries in the same node-allocation - with a
> >>>>>> single batch file and send the '-log_summary' for them?
> >>>>>> run petsc-dev
> >>>>>> run petsc-31
> >>>>>> run petsc-dev
> >>>>>> run petsc-31
> >>>>> Do you mean exactly the same physical nodes? I don't know how to do
> >>>>> that. My batch files for petsc-dev and petsc-3.1-p4 are identical
> >>> except
> >>>>> for the static executable.
> >>>>
> >>>>
> >>>> Haven't used batch stuff on ornl machine - but if the usual batch
> >>>> script file is something like: [for eg: pbs]
> >>>>
> >>>>>>>
> >>>> #!/bin/sh
> >>>> #PBS -N hello
> >>>> #PBS -l nodes=1:ppn=8
> >>>> #PBS -l walltime=0:00:15
> >>>> #PBS -j oe
> >>>>
> >>>> cd $PBS_O_WORKDIR
> >>>> mpiexec ./ex1
> >>>> <<<<<
> >>>>
> >>>> you could change it to:
> >>>>
> >>>>>>>>>
> >>>> #!/bin/sh
> >>>> #PBS -N hello
> >>>> #PBS -l nodes=1:ppn=8
> >>>> #PBS -l walltime=0:00:30
> >>>> #PBS -j oe
> >>>>
> >>>> cd $PBS_O_WORKDIR
> >>>> mpiexec ./ex1
> >>>> mpiexec ./ex2
> >>>> mpiexec ./ex1
> >>>> mpiexec ./ex2
> >>>> <<<<<<<
> >>>>
> >>>> and run it with a single allocation..
> >>>>
> >>>> Satish
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> What most experimenters take for granted before they begin their
> >>> experiments
> >>> is infinitely more interesting than any results to which their
> experiments
> >>> lead.
> >>> -- Norbert Wiener
> >>>
> >>>
> >>> Now it is really hard for me to understand what the problem is since
> both
> >>> 3.1 and dev check for this
> >>> function in the same way. Moreover, it does not depend on shared
> libraries.
> >>>
> >>> Satish, have you seen this error before on the XT5?
> >>>
> >>> Matt
> >>>
> >>> On Mon, Apr 4, 2011 at 6:45 PM, Satish Balay <petsc-maint at mcs.anl.gov
> >wrote:
> >>>
> >>>> On Mon, 4 Apr 2011, Sebastian Steiger wrote:
> >>>>
> >>>>> On 04/04/2011 05:42 PM, Satish Balay wrote:
> >>>>>> Could you run both the binaries in the same node-allocation - with a
> >>>>>> single batch file and send the '-log_summary' for them?
> >>>>>> run petsc-dev
> >>>>>> run petsc-31
> >>>>>> run petsc-dev
> >>>>>> run petsc-31
> >>>>> Do you mean exactly the same physical nodes? I don't know how to do
> >>>>> that. My batch files for petsc-dev and petsc-3.1-p4 are identical
> except
> >>>>> for the static executable.
> >>>>
> >>>>
> >>>> Haven't used batch stuff on ornl machine - but if the usual batch
> >>>> script file is something like: [for eg: pbs]
> >>>>
> >>>>>>>
> >>>> #!/bin/sh
> >>>> #PBS -N hello
> >>>> #PBS -l nodes=1:ppn=8
> >>>> #PBS -l walltime=0:00:15
> >>>> #PBS -j oe
> >>>>
> >>>> cd $PBS_O_WORKDIR
> >>>> mpiexec ./ex1
> >>>> <<<<<
> >>>>
> >>>> you could change it to:
> >>>>
> >>>>>>>>>
> >>>> #!/bin/sh
> >>>> #PBS -N hello
> >>>> #PBS -l nodes=1:ppn=8
> >>>> #PBS -l walltime=0:00:30
> >>>> #PBS -j oe
> >>>>
> >>>> cd $PBS_O_WORKDIR
> >>>> mpiexec ./ex1
> >>>> mpiexec ./ex2
> >>>> mpiexec ./ex1
> >>>> mpiexec ./ex2
> >>>> <<<<<<<
> >>>>
> >>>> and run it with a single allocation..
> >>>>
> >>>> Satish
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> What most experimenters take for granted before they begin their
> >>> experiments is infinitely more interesting than any results to which
> their
> >>> experiments lead.
> >>> -- Norbert Wiener
> >>>
> >>>
> >>
> >>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110405/df18c497/attachment.html>
More information about the petsc-dev
mailing list