[petsc-users] Configure of 3.16.1 failing on mpich/yaksa

Satish Balay balay at mcs.anl.gov
Mon Nov 8 13:41:05 CST 2021


Alternatively you can try an older MPICH release - perhaps this issue is related to a newly introduced feature.

--download-mpich=https://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2.tar.gz

Satish

On Mon, 8 Nov 2021, Satish Balay via petsc-users wrote:

> This looks like a bug report to mpich.
> 
> You should be able to reproduce this without PETSc - by directly building MPICH.
> 
> And then report to MPICH developers.
> 
> Wrt petsc - you could try --download-openmpi [instead of --download-mpich] and see if that works better.
> 
> Yeah cuda obtained from nvida and cuda repackaged by ubuntu have subtle differences that can cause cuda failures.
> 
> As you say - mpich might have a configure option to disable this. If you are able to find this option - you can use it via petsc configure with:
> 
> --download-mpich-configure-arguments=string
> 
> Satish
> 
> On Mon, 8 Nov 2021, Daniel Stone wrote:
> 
> > Hello all,
> > 
> > I've been having some configure failures trying to configure petsc, on
> > Ubuntu 20, when
> > downloading mpich.
> > 
> > 
> > This seems to be related to the use of
> > "#!/bin/sh"
> > found in the script
> > mpich-3.4.2/modules/yaksa/src/backend/cuda/cudalt.sh
> > 
> > /bin/sh in Ubuntu20 is dash, not bash, and line 35 of the script is:
> > CMD="${@:2} -Xcompiler -fPIC -o $PIC_FILEPATH"
> > which is apparently not valid dash syntax. I see "bad substitution" errors
> > when
> > trying to run this script in isolation, which can be fixed by replacing the
> > top line with
> > 
> > "#!/bin/bash"
> > 
> > The petsc config log points to this line in this script:
> > 
> > ------------------------------------------------------------------------------------
> > 
> > make[2]: Entering directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hindexed__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_contig__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hvector__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_resized__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_blkhindx__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hindexed__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_contig__Bool.lo
> >   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_resized__Bool.lo
> > make[2]: Leaving directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
> > make[1]: Leaving directory
> > '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2'/usr/bin/ar: `u'
> > modifier ignored since `D' is the default (see `U')
> > /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> > /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo] Error 2
> > make[2]: *** Waiting for unfinished jobs....
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo] Error 2
> > make[2]: *** [Makefile:8697:
> > src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo] Error 2
> > 
> > ----------------------------------------------------------------------------
> > 
> > What is interesting is the choice made by the config script here to make
> > yaksa "cuda-aware", which I do not
> > understand how to control. By this I mean - the use of NVCC, the use of
> > files with "cudai" in the name,
> > and the running of the cudalt.sh script.
> > 
> > This is especially odd given that on another machine, also with Ubuntu20,
> > none of this occurs, despite
> > using the exact same configure instructions:
> > 
> > ---------------------------------------------------
> > CC       src/backend/seq/pup/yaksuri_seqi_pup_blkhindx_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hvector__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_blkhindx__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hvector__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_blkhindx__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hvector__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_blkhindx__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hindexed__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_contig__Bool.lo
> >   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_resized__Bool.lo
> > 
> > 
> > -------------------------------------------------------
> > 
> > On both machines petsc locates a working nvcc without trouble. One
> > difference is
> > that on the good machine, cuda 11 is installed, built from source, while on
> > the bad machine
> > cuda 10 is installed, installed via apt-get. On the good machine, running
> > the cudalt.sh script in
> > isolation results in the same bad substitution error.
> > 
> > Can someone help me understand where the difference of behaviour might come
> > from, w.r.t.
> > cuda? Adding the --with-cuda=0 flag on the bad machine made no difference.
> > Is there any way
> > of communicating to mpich/yaksuri that I don't want whatever features that
> > involve the cudalt.sh
> > script being run?
> > 
> > The configure command I use is:
> > 
> > ./configure --download-mpich=yes --download-hdf5=yes
> > --download-fblaslapack=yes --download-metis=yes  --download-cmake=yes
> >  --download-ptscotch=yes --download-hypre=yes --with-debugging=0
> > COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
> > -download-hdf5-fortran-bindings=yes --download-sowing
> > 
> > 
> > 
> > Thanks,
> > 
> > Daniel
> > 
> 



More information about the petsc-users mailing list