[petsc-dev] error with karlrupp/fix-cuda-streams

Mark Adams mfadams at lbl.gov
Wed Sep 25 14:24:59 CDT 2019


I did test this and sent the log (error).

On Wed, Sep 25, 2019 at 2:58 PM Balay, Satish <balay at mcs.anl.gov> wrote:

> I made changes and asked to retest with the latest changes.
>
> Satish
>
> On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
>
> > Oh, and I tested the branch and it didn't work. file was attached.
> >
> > On Wed, Sep 25, 2019 at 2:38 PM Mark Adams <mfadams at lbl.gov> wrote:
> >
> > >
> > >
> > > On Wed, Sep 25, 2019 at 2:23 PM Balay, Satish <balay at mcs.anl.gov>
> wrote:
> > >
> > >> On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> > >>
> > >> > On Wed, Sep 25, 2019 at 12:44 PM Balay, Satish <balay at mcs.anl.gov>
> > >> wrote:
> > >> >
> > >> > > Can you retry with updated balay/fix-mpiexec-shell-escape branch?
> > >> > >
> > >> > >
> > >> > > current mpiexec interface/code in petsc is messy.
> > >> > >
> > >> > > Its primarily needed for the test suite. But then - you can't
> easily
> > >> > > run the test suite on machines like summit.
> > >> > >
> > >> > > Also - it assumes mpiexec provided supports '-n 1'. However if one
> > >> > > provides non-standard mpiexec such as --with-mpiexec="jsrun -g 1"
> -
> > >> > > what is the appropriate thing here?
> > >> > >
> > >> >
> > >> > jsrun does take -n. It just has other args. I am trying to check if
> it
> > >> > requires other args. I thought it did but let me check.
> > >>
> > >>
> > >>
> https://www.olcf.ornl.gov/for-users/system-user-guides/summitdev-quickstart-guide/
> > >>
> > >> -n      --nrs   Number of resource sets
> > >>
> > >>
> > > -n is still supported. There are two versions of everything. One letter
> > > ones and more explanatory ones.
> > >
> > > In fact they have a nice little tool to viz layouts and they give you
> the
> > > command line with this short form, eg,
> > >
> > > https://jsrunvisualizer.olcf.ornl.gov/?s1f0o01n6c4g1r14d1b21l0=
> > >
> > >
> > >
> > >> Beta2 Change (October 17):
> > >> -n was be replaced by -nnodes
> > >>
> > >> So its not the same functionality as 'mpiexec -n'
> > >>
> > >
> > > I am still waiting for an interactive shell to test just -n. That
> really
> > > should run
> > >
> > >
> > >>
> > >> Either way - please try the above branch
> > >
> > >
> > >> Satish
> > >>
> > >> >
> > >> >
> > >> > >
> > >> > > And then configure needs to run some binaries for some checks -
> here
> > >> > > perhaps '-n 1' doesn't matter. [MPICH defaults to 1, OpenMPI
> defaults
> > >> > > to ncore]. So perhaps mpiexec is required for this purpose on
> summit?
> > >> > >
> > >> > > And then there is this code to escape spaces in path - for
> > >> > > windows. [but we have to make sure this is not in code-path for
> user
> > >> > > specified --with-mpiexec="jsrun -g 1"
> > >> > >
> > >> > > Satish
> > >> > >
> > >> > > On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> > >> > >
> > >> > > > No luck,
> > >> > > >
> > >> > > > On Wed, Sep 25, 2019 at 10:01 AM Balay, Satish <
> balay at mcs.anl.gov>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Mark,
> > >> > > > >
> > >> > > > > Can you try the fix in branch balay/fix-mpiexec-shell-escape
> and
> > >> see
> > >> > > if it
> > >> > > > > works?
> > >> > > > >
> > >> > > > > Satish
> > >> > > > >
> > >> > > > > On Wed, 25 Sep 2019, Balay, Satish via petsc-dev wrote:
> > >> > > > >
> > >> > > > > > Mark,
> > >> > > > > >
> > >> > > > > > Can you send configure.log from
> mark/fix-cuda-with-gamg-pintocpu
> > >> > > branch?
> > >> > > > > >
> > >> > > > > > Satish
> > >> > > > > >
> > >> > > > > > On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> > >> > > > > >
> > >> > > > > > > I double checked that a clean build of your (master)
> branch
> > >> has
> > >> > > this
> > >> > > > > error
> > >> > > > > > > by my branch (mark/fix-cuda-with-gamg-pintocpu), which may
> > >> include
> > >> > > > > stuff
> > >> > > > > > > from Barry that is not yet in master, works.
> > >> > > > > > >
> > >> > > > > > > On Wed, Sep 25, 2019 at 5:26 AM Karl Rupp via petsc-dev <
> > >> > > > > > > petsc-dev at mcs.anl.gov> wrote:
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On 9/25/19 11:12 AM, Mark Adams via petsc-dev wrote:
> > >> > > > > > > > > I am using karlrupp/fix-cuda-streams, merged with
> master,
> > >> and I
> > >> > > > > get this
> > >> > > > > > > > > error:
> > >> > > > > > > > >
> > >> > > > > > > > > Could not execute "['jsrun -g\\ 1 -c\\ 1 -a\\ 1
> > >> > > --oversubscribe -n
> > >> > > > > 1
> > >> > > > > > > > > printenv']":
> > >> > > > > > > > > Error, invalid argument:  1
> > >> > > > > > > > >
> > >> > > > > > > > > My branch mark/fix-cuda-with-gamg-pintocpu seems to
> work
> > >> but I
> > >> > > did
> > >> > > > > edit
> > >> > > > > > > > > the jsrun command but Karl's branch still fails.
> (SUMMIT
> > >> was
> > >> > > down
> > >> > > > > today
> > >> > > > > > > > > so there could have been updates).
> > >> > > > > > > > >
> > >> > > > > > > > > Any suggestions?
> > >> > > > > > > >
> > >> > > > > > > > Looks very much like a systems issue to me.
> > >> > > > > > > >
> > >> > > > > > > > Best regards,
> > >> > > > > > > > Karli
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190925/7c869f3e/attachment.html>


More information about the petsc-dev mailing list