[petsc-users] Configure of 3.16.1 failing on mpich/yaksa

Daniel Stone daniel.stone at opengosim.com
Tue Nov 9 05:55:34 CST 2021


Just for interest, I found out that the decision to use cuda comes from
Yaksa checking for cuda_runtime.h
and such files as part of its configure process - so it's not a decision
made by the petsc or even mpich
scripts. On the machine with cuda 11 installed manually, it can't find the
header files, understandably as
it hasn't been told where to look (with_cuda_dir=.... etc not set). On the
machine where cuda 10 is installed
via apt-get, it can find them - presumably because it looks in
/usr/include/ or similar by default - some standard
place where apt-get would put the headers.

Thanks for the advice on various workarounds - I had forgotten that one
could do stuff like

--download-mpich=
https://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2.tar.gz

which would probably be the cleanest option.


On Mon, Nov 8, 2021 at 7:36 PM Satish Balay <balay at mcs.anl.gov> wrote:

> This looks like a bug report to mpich.
>
> You should be able to reproduce this without PETSc - by directly building
> MPICH.
>
> And then report to MPICH developers.
>
> Wrt petsc - you could try --download-openmpi [instead of --download-mpich]
> and see if that works better.
>
> Yeah cuda obtained from nvida and cuda repackaged by ubuntu have subtle
> differences that can cause cuda failures.
>
> As you say - mpich might have a configure option to disable this. If you
> are able to find this option - you can use it via petsc configure with:
>
> --download-mpich-configure-arguments=string
>
> Satish
>
> On Mon, 8 Nov 2021, Daniel Stone wrote:
>
> > Hello all,
> >
> > I've been having some configure failures trying to configure petsc, on
> > Ubuntu 20, when
> > downloading mpich.
> >
> >
> > This seems to be related to the use of
> > "#!/bin/sh"
> > found in the script
> > mpich-3.4.2/modules/yaksa/src/backend/cuda/cudalt.sh
> >
> > /bin/sh in Ubuntu20 is dash, not bash, and line 35 of the script is:
> > CMD="${@:2} -Xcompiler -fPIC -o $PIC_FILEPATH"
> > which is apparently not valid dash syntax. I see "bad substitution"
> errors
> > when
> > trying to run this script in isolation, which can be fixed by replacing
> the
> > top line with
> >
> > "#!/bin/bash"
> >
> > The petsc config log points to this line in this script:
> >
> >
> ------------------------------------------------------------------------------------
> >
> > make[2]: Entering directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hindexed__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_contig__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hvector__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_hvector_resized__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_blkhindx__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hindexed__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_contig__Bool.lo
> >   NVCC
>  src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_resized__Bool.lo
> > make[2]: Leaving directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
> > make[1]: Leaving directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2'/usr/bin/ar:
> `u'
> > modifier ignored since `D' is the default (see `U')
> > /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> > /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo] Error 2
> > make[2]: *** Waiting for unfinished jobs....
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo] Error 2
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo] Error 2
> >
> >
> ----------------------------------------------------------------------------
> >
> > What is interesting is the choice made by the config script here to make
> > yaksa "cuda-aware", which I do not
> > understand how to control. By this I mean - the use of NVCC, the use of
> > files with "cudai" in the name,
> > and the running of the cudalt.sh script.
> >
> > This is especially odd given that on another machine, also with Ubuntu20,
> > none of this occurs, despite
> > using the exact same configure instructions:
> >
> > ---------------------------------------------------
> > CC       src/backend/seq/pup/yaksuri_seqi_pup_blkhindx_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hvector__Bool.lo
> >   CC
>  src/backend/seq/pup/yaksuri_seqi_pup_hindexed_blkhindx__Bool.lo
> >   CC
>  src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hvector__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_blkhindx__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hvector__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_blkhindx__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_resized__Bool.lo
> >
> >
> > -------------------------------------------------------
> >
> > On both machines petsc locates a working nvcc without trouble. One
> > difference is
> > that on the good machine, cuda 11 is installed, built from source, while
> on
> > the bad machine
> > cuda 10 is installed, installed via apt-get. On the good machine, running
> > the cudalt.sh script in
> > isolation results in the same bad substitution error.
> >
> > Can someone help me understand where the difference of behaviour might
> come
> > from, w.r.t.
> > cuda? Adding the --with-cuda=0 flag on the bad machine made no
> difference.
> > Is there any way
> > of communicating to mpich/yaksuri that I don't want whatever features
> that
> > involve the cudalt.sh
> > script being run?
> >
> > The configure command I use is:
> >
> > ./configure --download-mpich=yes --download-hdf5=yes
> > --download-fblaslapack=yes --download-metis=yes  --download-cmake=yes
> >  --download-ptscotch=yes --download-hypre=yes --with-debugging=0
> > COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
> > -download-hdf5-fortran-bindings=yes --download-sowing
> >
> >
> >
> > Thanks,
> >
> > Daniel
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211109/0a6f24a8/attachment.html>


More information about the petsc-users mailing list