[petsc-users] Configure of 3.16.1 failing on mpich/yaksa

Daniel Stone daniel.stone at opengosim.com
Mon Nov 8 14:01:44 CST 2021


Aha - they already know. A commit from April in their main repo fixes
exactly this problem. It's present in the 4.0.a.2 alpha, but
it looks like we have to wait until a new release version with the fix in.

https://github.com/pmodels/yaksa/commit/eed193d9775dd0f33cbd8caa0dd946647b751b18#diff-f5310b2c9b83ad225b424c6ab70b970c2c57a4db39daf7b4f8c017df92646c84

On Mon, Nov 8, 2021 at 7:36 PM Satish Balay <balay at mcs.anl.gov> wrote:

> This looks like a bug report to mpich.
>
> You should be able to reproduce this without PETSc - by directly building
> MPICH.
>
> And then report to MPICH developers.
>
> Wrt petsc - you could try --download-openmpi [instead of --download-mpich]
> and see if that works better.
>
> Yeah cuda obtained from nvida and cuda repackaged by ubuntu have subtle
> differences that can cause cuda failures.
>
> As you say - mpich might have a configure option to disable this. If you
> are able to find this option - you can use it via petsc configure with:
>
> --download-mpich-configure-arguments=string
>
> Satish
>
> On Mon, 8 Nov 2021, Daniel Stone wrote:
>
> > Hello all,
> >
> > I've been having some configure failures trying to configure petsc, on
> > Ubuntu 20, when
> > downloading mpich.
> >
> >
> > This seems to be related to the use of
> > "#!/bin/sh"
> > found in the script
> > mpich-3.4.2/modules/yaksa/src/backend/cuda/cudalt.sh
> >
> > /bin/sh in Ubuntu20 is dash, not bash, and line 35 of the script is:
> > CMD="${@:2} -Xcompiler -fPIC -o $PIC_FILEPATH"
> > which is apparently not valid dash syntax. I see "bad substitution"
> errors
> > when
> > trying to run this script in isolation, which can be fixed by replacing
> the
> > top line with
> >
> > "#!/bin/bash"
> >
> > The petsc config log points to this line in this script:
> >
> >
> ------------------------------------------------------------------------------------
> >
> > make[2]: Entering directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hindexed__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_contig__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hvector__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_resized__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_blkhindx__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hindexed__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_contig__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_resized__Bool.lo
> > make[2]: Leaving directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
> > make[1]: Leaving directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2'/usr/bin/ar:
> `u'
> > modifier ignored since `D' is the default (see `U')
> > /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> > /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo] Error 2
> > make[2]: *** Waiting for unfinished jobs....
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo] Error 2
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo] Error 2
> >
> >
> ----------------------------------------------------------------------------
> >
> > What is interesting is the choice made by the config script here to make
> > yaksa "cuda-aware", which I do not
> > understand how to control. By this I mean - the use of NVCC, the use of
> > files with "cudai" in the name,
> > and the running of the cudalt.sh script.
> >
> > This is especially odd given that on another machine, also with Ubuntu20,
> > none of this occurs, despite
> > using the exact same configure instructions:
> >
> > ---------------------------------------------------
> > CC       src/backend/seq/pup/yaksuri_seqi_pup_blkhindx_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hvector__Bool.lo
> >   CC
>  src/backend/seq/pup/yaksuri_seqi_pup_hindexed_blkhindx__Bool.lo
> >   CC
>  src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hvector__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_blkhindx__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hvector__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_blkhindx__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_resized__Bool.lo
> >
> >
> > -------------------------------------------------------
> >
> > On both machines petsc locates a working nvcc without trouble. One
> > difference is
> > that on the good machine, cuda 11 is installed, built from source, while
> on
> > the bad machine
> > cuda 10 is installed, installed via apt-get. On the good machine, running
> > the cudalt.sh script in
> > isolation results in the same bad substitution error.
> >
> > Can someone help me understand where the difference of behaviour might
> come
> > from, w.r.t.
> > cuda? Adding the --with-cuda=0 flag on the bad machine made no
> difference.
> > Is there any way
> > of communicating to mpich/yaksuri that I don't want whatever features
> that
> > involve the cudalt.sh
> > script being run?
> >
> > The configure command I use is:
> >
> > ./configure --download-mpich=yes --download-hdf5=yes
> > --download-fblaslapack=yes --download-metis=yes  --download-cmake=yes
> >  --download-ptscotch=yes --download-hypre=yes --with-debugging=0
> > COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
> > -download-hdf5-fortran-bindings=yes --download-sowing
> >
> >
> >
> > Thanks,
> >
> > Daniel
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211108/f87d1357/attachment.html>


More information about the petsc-users mailing list