[petsc-dev] help on Summit

Mark Adams mfadams at lbl.gov
Mon Aug 30 16:22:40 CDT 2021


OK, Stefano got this to build. I now get this error.
We had this before and thought that you needed to run the GPU aware MPI
off. The test failed in make without PETSC_OPTIONS='-use_gpu_aware_mpi 0'.
After adding this I get a runtime error.
So my PETSC_OPTIONS did seem to work.

Treb, maybe this will work for you. This might be from our testing makefile.

17:14 stefanozampini/hypre-gpu>
/gpfs/alpine/csc314/scratch/adams/petsc2$ *export
PETSC_OPTIONS='-use_gpu_aware_mpi 0'*
17:15 stefanozampini/hypre-gpu> /gpfs/alpine/csc314/scratch/adams/petsc2$
make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc2
PETSC_ARCH=arch-summit-hypre-cuda-dbg2 *-f gmakefile.test test
search='ksp_ksp_tutorials-ex4_hypre_device'*
Using MAKEFLAGS: -- search=ksp_ksp_tutorials-ex4_hypre_device
PETSC_ARCH=arch-summit-hypre-cuda-dbg2
PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc2
        TEST
arch-summit-hypre-cuda-dbg2/tests/counts/ksp_ksp_tutorials-ex4_hypre_device.counts
 ok ksp_ksp_tutorials-ex4_hypre_device+nsize-1
 ok diff-ksp_ksp_tutorials-ex4_hypre_device+nsize-1
not ok ksp_ksp_tutorials-ex4_hypre_device+nsize-2 # Error code: 1
# [1] (1117677) Warning: Could not find key lid0:0:2 in cache
<=========================
# [1] (1117677) Warning: Could not find key qpn0:0:0:2 in cache
<=========================
# Unable to connect queue-pairs
# [g13n10:1117677] Error: common_pami.c:1094 - ompi_common_pami_init() 1:
Unable to create 1 PAMI communication context(s) rc=1
# --------------------------------------------------------------------------
# No components were able to be opened in the pml framework.
#
# This typically means that either no components of this type were
# installed, or none of the installed components can be loaded.
# Sometimes this means that shared libraries required by these
# components are unable to be found/loaded.
#
#  Host:      g13n10
#  Framework: pml
# --------------------------------------------------------------------------
# [g13n10:1117677] PML pami cannot be selected
 ok ksp_ksp_tutorials-ex4_hypre_device # SKIP Command failed so no diff


# FAILED ksp_ksp_tutorials-ex4_hypre_device+nsize-2
#
# To rerun failed tests:
#     /usr/bin/gmake -f gmakefile test test-fail=1

On Mon, Aug 30, 2021 at 3:35 PM Mark Adams <mfadams at lbl.gov> wrote:

> I see you have an MR. Should I try these changes in my repo?
>
> On Mon, Aug 30, 2021 at 3:32 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I am in a branch of Stefano's and a user wants this for a milestone asap.
>> Maybe you can send me the fix and I can add it manually.
>> My branch is in a funny "main<>" state and I'm not sure how to pull,
>> etc., without Stefano.
>> Thanks,
>> Mark
>>
>>
>> On Mon, Aug 30, 2021 at 3:28 PM Jacob Faibussowitsch <jacob.fai at gmail.com>
>> wrote:
>>
>>> That did not seem to work.
>>>
>>>
>>> So gcc didn’t ignore the hand-coded definitions in
>>> src/sys/objects/device/interface/cupminterface.cxx
>>>
>>> See https://gitlab.com/petsc/petsc/-/merge_requests/4271 where I swap
>>> constexpr for const and see if it works.
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>> On Aug 30, 2021, at 14:14, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> That did not seem to work.
>>>
>>> 15:09 main<> /gpfs/alpine/csc314/scratch/adams/petsc2$ mpicc --version
>>> gcc (GCC) 9.1.0
>>> Copyright (C) 2019 Free Software Foundation, Inc.
>>> This is free software; see the source for copying conditions.  There is
>>> NO
>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>>> PURPOSE.
>>>
>>> I have read "GCC 8.x (and later) fully supports all of C++17."
>>>
>>>
>>>
>>> On Mon, Aug 30, 2021 at 3:07 PM Jacob Faibussowitsch <
>>> jacob.fai at gmail.com> wrote:
>>>
>>>> Yeah I suppose so, all the values we alias are integral types so static
>>>> const should have equivalent compile-time assurance as constexpr.
>>>>
>>>> Best regards,
>>>>
>>>> Jacob Faibussowitsch
>>>> (Jacob Fai - booss - oh - vitch)
>>>>
>>>> On Aug 30, 2021, at 13:44, Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>> Can you use less fancy 'static const int'?
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Mon, Aug 30, 2021 at 1:02 PM Jacob Faibussowitsch <
>>>> jacob.fai at gmail.com> wrote:
>>>>
>>>>> No luck with C++14
>>>>>
>>>>>
>>>>> TL;DR: you need to have host and device compiler either both using
>>>>> c++17 or neither using c++17.
>>>>>
>>>>> Long version:
>>>>> C++17 among other things changed how static constexpr member variables
>>>>> for classes worked. Previously if I had a class with a static constexpr
>>>>> member variable I would have to not only declare it inline within the
>>>>> class, but also define it within an executable otherwise the variable would
>>>>> not actually have any physical memory address:
>>>>>
>>>>> // foo.hpp
>>>>> class foo
>>>>> {
>>>>>   static constexpr int bar = 5;
>>>>> };
>>>>>
>>>>> // foo.cpp
>>>>> int foo::bar;
>>>>>
>>>>> In c++17 however this changed because you can have static “inline”
>>>>> variables. All this does is force the compiler define the variable for you
>>>>> instead. The issue of course is that static constexpr implicitly makes that
>>>>> variable inline in c++17. So to sum it up:
>>>>>
>>>>> 1. The c++17 compiler (nvcc) sees the static constexpr variable, goes
>>>>> “hmm ok I will define this in some undefined location”.
>>>>> 2. The c++11/14 compiler comes along, sees your hand-coded definition
>>>>> of the variable and goes “ah but I think I’ve seen this before, I’ll ignore
>>>>> it”. This silent rejection is due to the hand-coded definition idiom being
>>>>> deprecated from c++17 onwards. Stupid, I know.
>>>>> 2. The linker (driven by the c++11/14 compiler since PETSc links using
>>>>> the host compiler) comes along and now suddenly cannot find the literal
>>>>> definition, because it doesn’t know what the c++17 did. Disaster!
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Jacob Faibussowitsch
>>>>> (Jacob Fai - booss - oh - vitch)
>>>>>
>>>>> On Aug 30, 2021, at 10:12, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>> No luck with C++14
>>>>>
>>>>>        CUDAC
>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>>    CUDAC.dep
>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>>      CLINKER arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3
>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/impls/cupm/cuda/cupmcontext.o:(.rodata._ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE[_ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE]+0x0):
>>>>> multiple definition of
>>>>> `Petsc::CUPMInterface<(Petsc::CUPMDeviceKind)0>::cupmStreamNonBlocking'
>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/interface/cupminterface.o:(.rodata+0x44):
>>>>> first defined here
>>>>> /usr/bin/ld: link errors found, deleting executable
>>>>> `arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3'
>>>>> collect2: error: ld returned 1 exit status
>>>>> gmake[3]: *** [gmakefile:113:
>>>>> arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3] Error 1
>>>>> gmake[2]: ***
>>>>> [/gpfs/alpine/csc314/scratch/adams/petsc2/lib/petsc/conf/rules:50: libs]
>>>>> Error 2
>>>>> **************************ERROR*************************************
>>>>>   Error during compile, check
>>>>> arch-summit-hypre-cuda-dbg/lib/petsc/conf/make.log
>>>>>   Send it and arch-summit-hypre-cuda-dbg/lib/petsc/conf/configure.log
>>>>> to petsc-maint at mcs.anl.gov
>>>>> ********************************************************************
>>>>> gmake[1]: *** [makefile:40: all] Error 1
>>>>>
>>>>> On Mon, Aug 30, 2021 at 10:50 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> Stefano suggested C++14 in configure. I was using C++11.
>>>>>>
>>>>>> On Mon, Aug 30, 2021 at 10:46 AM Junchao Zhang <
>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>
>>>>>>>  Petsc::CUPMInterface
>>>>>>> @Jacob Faibussowitsch <jacob.fai at gmail.com>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 30, 2021 at 9:35 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> I was running fine this AM and am bouncing between modules to help
>>>>>>>> two apps (ECP milestone season) at the same time and something broke. I did
>>>>>>>> update main and I get the same error in main and a hypre branch of
>>>>>>>> Stefano's.
>>>>>>>> I started with a clean build and checked my modules...
>>>>>>>> Any ideas?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>        CC arch-summit-hypre-cuda-dbg/obj/tao/interface/taosolver.o
>>>>>>>>           CC arch-summit-hypre-cuda-dbg/obj/ts/interface/ts.o
>>>>>>>>        CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>>>>>>>>    CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>>>>>>>>        CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>>>>>>>>    CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>>>>>>>>        CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/ts/utils/dmplexlandau/cuda/landaucu.o
>>>>>>>>    CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/ts/utils/dmplexlandau/cuda/landaucu.o
>>>>>>>>        CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/vec/impls/seq/seqcuda/veccuda2.o
>>>>>>>>    CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/vec/impls/seq/seqcuda/veccuda2.o
>>>>>>>>        CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.o
>>>>>>>>    CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.o
>>>>>>>>        CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>>>>>>>>    CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>>>>>>>>        CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>>>>>    CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>>>>>      CLINKER arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/impls/cupm/cuda/cupmcontext.o:(.rodata._ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE[_ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE]+0x0):
>>>>>>>> multiple definition of
>>>>>>>> `Petsc::CUPMInterface<(Petsc::CUPMDeviceKind)0>::cupmStreamNonBlocking'
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/interface/cupminterface.o:(.rodata+0x44):
>>>>>>>> first defined here
>>>>>>>> /usr/bin/ld: link errors found, deleting executable
>>>>>>>> `arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3'
>>>>>>>> collect2: error: ld returned 1 exit status
>>>>>>>> gmake[3]: *** [gmakefile:113:
>>>>>>>> arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3] Error 1
>>>>>>>> gmake[2]: ***
>>>>>>>> [/gpfs/alpine/csc314/scratch/adams/petsc2/lib/petsc/conf/rules:50: libs]
>>>>>>>> Error 2
>>>>>>>> **************************ERROR*************************************
>>>>>>>>   Error during compile, check
>>>>>>>> arch-summit-hypre-cuda-dbg/lib/petsc/conf/make.log
>>>>>>>>   Send it and
>>>>>>>> arch-summit-hypre-cuda-dbg/lib/petsc/conf/configure.log to
>>>>>>>> petsc-maint at mcs.anl.gov
>>>>>>>> ********************************************************************
>>>>>>>> gmake[1]: *** [makefile:40: all] Error 1
>>>>>>>> make: *** [GNUmakefile:9: all] Error 2
>>>>>>>>
>>>>>>>
>>>>>
>>>> <make.log><configure.log>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210830/b1539c01/attachment-0001.html>


More information about the petsc-dev mailing list