[petsc-users] Configure of 3.16.1 failing on mpich/yaksa

Satish Balay balay at mcs.anl.gov
Mon Nov 8 13:35:53 CST 2021


This looks like a bug report to mpich.

You should be able to reproduce this without PETSc - by directly building MPICH.

And then report to MPICH developers.

Wrt petsc - you could try --download-openmpi [instead of --download-mpich] and see if that works better.

Yeah cuda obtained from nvida and cuda repackaged by ubuntu have subtle differences that can cause cuda failures.

As you say - mpich might have a configure option to disable this. If you are able to find this option - you can use it via petsc configure with:

--download-mpich-configure-arguments=string

Satish

On Mon, 8 Nov 2021, Daniel Stone wrote:

> Hello all,
> 
> I've been having some configure failures trying to configure petsc, on
> Ubuntu 20, when
> downloading mpich.
> 
> 
> This seems to be related to the use of
> "#!/bin/sh"
> found in the script
> mpich-3.4.2/modules/yaksa/src/backend/cuda/cudalt.sh
> 
> /bin/sh in Ubuntu20 is dash, not bash, and line 35 of the script is:
> CMD="${@:2} -Xcompiler -fPIC -o $PIC_FILEPATH"
> which is apparently not valid dash syntax. I see "bad substitution" errors
> when
> trying to run this script in isolation, which can be fixed by replacing the
> top line with
> 
> "#!/bin/bash"
> 
> The petsc config log points to this line in this script:
> 
> ------------------------------------------------------------------------------------
> 
> make[2]: Entering directory
> '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hindexed__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_contig__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hvector__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_hvector_resized__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_blkhindx__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_hindexed__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_contig__Bool.lo
>   NVCC     src/backend/cuda/pup/yaksuri_cudai_pup_blkhindx_resized__Bool.lo
> make[2]: Leaving directory
> '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2/modules/yaksa'
> make[1]: Leaving directory
> '/home/david/petsc/petsc_opt/externalpackages/mpich-3.4.2'/usr/bin/ar: `u'
> modifier ignored since `D' is the default (see `U')
> /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> /usr/bin/ar: `u' modifier ignored since `D' is the default (see `U')
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> ./src/backend/cuda/cudalt.sh: 35: Bad substitution
> make[2]: *** [Makefile:8697:
> src/backend/cuda/pup/yaksuri_cudai_pup_hvector__Bool.lo] Error 2
> make[2]: *** Waiting for unfinished jobs....
> make[2]: *** [Makefile:8697:
> src/backend/cuda/pup/yaksuri_cudai_pup_hvector_hvector__Bool.lo] Error 2
> make[2]: *** [Makefile:8697:
> src/backend/cuda/pup/yaksuri_cudai_pup_hvector_blkhindx__Bool.lo] Error 2
> 
> ----------------------------------------------------------------------------
> 
> What is interesting is the choice made by the config script here to make
> yaksa "cuda-aware", which I do not
> understand how to control. By this I mean - the use of NVCC, the use of
> files with "cudai" in the name,
> and the running of the cudalt.sh script.
> 
> This is especially odd given that on another machine, also with Ubuntu20,
> none of this occurs, despite
> using the exact same configure instructions:
> 
> ---------------------------------------------------
> CC       src/backend/seq/pup/yaksuri_seqi_pup_blkhindx_resized__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hvector__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_blkhindx__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_hindexed__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_contig__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_hindexed_resized__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hvector__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_blkhindx__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_hindexed__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_contig__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_contig_resized__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hvector__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_blkhindx__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_hindexed__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_contig__Bool.lo
>   CC       src/backend/seq/pup/yaksuri_seqi_pup_resized_resized__Bool.lo
> 
> 
> -------------------------------------------------------
> 
> On both machines petsc locates a working nvcc without trouble. One
> difference is
> that on the good machine, cuda 11 is installed, built from source, while on
> the bad machine
> cuda 10 is installed, installed via apt-get. On the good machine, running
> the cudalt.sh script in
> isolation results in the same bad substitution error.
> 
> Can someone help me understand where the difference of behaviour might come
> from, w.r.t.
> cuda? Adding the --with-cuda=0 flag on the bad machine made no difference.
> Is there any way
> of communicating to mpich/yaksuri that I don't want whatever features that
> involve the cudalt.sh
> script being run?
> 
> The configure command I use is:
> 
> ./configure --download-mpich=yes --download-hdf5=yes
> --download-fblaslapack=yes --download-metis=yes  --download-cmake=yes
>  --download-ptscotch=yes --download-hypre=yes --with-debugging=0
> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3
> -download-hdf5-fortran-bindings=yes --download-sowing
> 
> 
> 
> Thanks,
> 
> Daniel
> 



More information about the petsc-users mailing list