[petsc-dev] error with karlrupp/fix-cuda-streams

Mills, Richard Tran rtmills at anl.gov
Wed Sep 25 13:53:52 CDT 2019


On 9/25/19 11:38 AM, Mark Adams via petsc-dev wrote:
[...]
> jsrun does take -n. It just has other args. I am trying to check if it
> requires other args. I thought it did but let me check.

https://www.olcf.ornl.gov/for-users/system-user-guides/summitdev-quickstart-guide/

-n      --nrs   Number of resource sets


-n is still supported. There are two versions of everything. One letter ones and more explanatory ones.
Yes, it's supported, but it's a little different than what "-n" usually does in mpiexec, where it means the number of processes. For 'jsrun', it means the number of resource sets, which is multiplied by the "tasks per resource set" specified by "-a" to get the MPI process count. I think if we can specify that "-a 1" is part of our "mpiexec", then we should be OK with using -n as PETSc normally does.

--Richard

In fact they have a nice little tool to viz layouts and they give you the command line with this short form, eg,

https://jsrunvisualizer.olcf.ornl.gov/?s1f0o01n6c4g1r14d1b21l0=


Beta2 Change (October 17):
-n was be replaced by -nnodes

So its not the same functionality as 'mpiexec -n'

I am still waiting for an interactive shell to test just -n. That really should run


Either way - please try the above branch

Satish

>
>
> >
> > And then configure needs to run some binaries for some checks - here
> > perhaps '-n 1' doesn't matter. [MPICH defaults to 1, OpenMPI defaults
> > to ncore]. So perhaps mpiexec is required for this purpose on summit?
> >
> > And then there is this code to escape spaces in path - for
> > windows. [but we have to make sure this is not in code-path for user
> > specified --with-mpiexec="jsrun -g 1"
> >
> > Satish
> >
> > On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> >
> > > No luck,
> > >
> > > On Wed, Sep 25, 2019 at 10:01 AM Balay, Satish <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
> > wrote:
> > >
> > > > Mark,
> > > >
> > > > Can you try the fix in branch balay/fix-mpiexec-shell-escape and see
> > if it
> > > > works?
> > > >
> > > > Satish
> > > >
> > > > On Wed, 25 Sep 2019, Balay, Satish via petsc-dev wrote:
> > > >
> > > > > Mark,
> > > > >
> > > > > Can you send configure.log from mark/fix-cuda-with-gamg-pintocpu
> > branch?
> > > > >
> > > > > Satish
> > > > >
> > > > > On Wed, 25 Sep 2019, Mark Adams via petsc-dev wrote:
> > > > >
> > > > > > I double checked that a clean build of your (master) branch has
> > this
> > > > error
> > > > > > by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include
> > > > stuff
> > > > > > from Barry that is not yet in master, works.
> > > > > >
> > > > > > On Wed, Sep 25, 2019 at 5:26 AM Karl Rupp via petsc-dev <
> > > > > > petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>> wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 9/25/19 11:12 AM, Mark Adams via petsc-dev wrote:
> > > > > > > > I am using karlrupp/fix-cuda-streams, merged with master, and I
> > > > get this
> > > > > > > > error:
> > > > > > > >
> > > > > > > > Could not execute "['jsrun -g\\ 1 -c\\ 1 -a\\ 1
> > --oversubscribe -n
> > > > 1
> > > > > > > > printenv']":
> > > > > > > > Error, invalid argument:  1
> > > > > > > >
> > > > > > > > My branch mark/fix-cuda-with-gamg-pintocpu seems to work but I
> > did
> > > > edit
> > > > > > > > the jsrun command but Karl's branch still fails. (SUMMIT was
> > down
> > > > today
> > > > > > > > so there could have been updates).
> > > > > > > >
> > > > > > > > Any suggestions?
> > > > > > >
> > > > > > > Looks very much like a systems issue to me.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Karli
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190925/280e798d/attachment.html>


More information about the petsc-dev mailing list