[petsc-dev] help on Summit
Mark Adams
mfadams at lbl.gov
Mon Aug 30 16:22:40 CDT 2021
OK, Stefano got this to build. I now get this error.
We had this before and thought that you needed to run the GPU aware MPI
off. The test failed in make without PETSC_OPTIONS='-use_gpu_aware_mpi 0'.
After adding this I get a runtime error.
So my PETSC_OPTIONS did seem to work.
Treb, maybe this will work for you. This might be from our testing makefile.
17:14 stefanozampini/hypre-gpu>
/gpfs/alpine/csc314/scratch/adams/petsc2$ *export
PETSC_OPTIONS='-use_gpu_aware_mpi 0'*
17:15 stefanozampini/hypre-gpu> /gpfs/alpine/csc314/scratch/adams/petsc2$
make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc2
PETSC_ARCH=arch-summit-hypre-cuda-dbg2 *-f gmakefile.test test
search='ksp_ksp_tutorials-ex4_hypre_device'*
Using MAKEFLAGS: -- search=ksp_ksp_tutorials-ex4_hypre_device
PETSC_ARCH=arch-summit-hypre-cuda-dbg2
PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc2
TEST
arch-summit-hypre-cuda-dbg2/tests/counts/ksp_ksp_tutorials-ex4_hypre_device.counts
ok ksp_ksp_tutorials-ex4_hypre_device+nsize-1
ok diff-ksp_ksp_tutorials-ex4_hypre_device+nsize-1
not ok ksp_ksp_tutorials-ex4_hypre_device+nsize-2 # Error code: 1
# [1] (1117677) Warning: Could not find key lid0:0:2 in cache
<=========================
# [1] (1117677) Warning: Could not find key qpn0:0:0:2 in cache
<=========================
# Unable to connect queue-pairs
# [g13n10:1117677] Error: common_pami.c:1094 - ompi_common_pami_init() 1:
Unable to create 1 PAMI communication context(s) rc=1
# --------------------------------------------------------------------------
# No components were able to be opened in the pml framework.
#
# This typically means that either no components of this type were
# installed, or none of the installed components can be loaded.
# Sometimes this means that shared libraries required by these
# components are unable to be found/loaded.
#
# Host: g13n10
# Framework: pml
# --------------------------------------------------------------------------
# [g13n10:1117677] PML pami cannot be selected
ok ksp_ksp_tutorials-ex4_hypre_device # SKIP Command failed so no diff
# FAILED ksp_ksp_tutorials-ex4_hypre_device+nsize-2
#
# To rerun failed tests:
# /usr/bin/gmake -f gmakefile test test-fail=1
On Mon, Aug 30, 2021 at 3:35 PM Mark Adams <mfadams at lbl.gov> wrote:
> I see you have an MR. Should I try these changes in my repo?
>
> On Mon, Aug 30, 2021 at 3:32 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I am in a branch of Stefano's and a user wants this for a milestone asap.
>> Maybe you can send me the fix and I can add it manually.
>> My branch is in a funny "main<>" state and I'm not sure how to pull,
>> etc., without Stefano.
>> Thanks,
>> Mark
>>
>>
>> On Mon, Aug 30, 2021 at 3:28 PM Jacob Faibussowitsch <jacob.fai at gmail.com>
>> wrote:
>>
>>> That did not seem to work.
>>>
>>>
>>> So gcc didn’t ignore the hand-coded definitions in
>>> src/sys/objects/device/interface/cupminterface.cxx
>>>
>>> See https://gitlab.com/petsc/petsc/-/merge_requests/4271 where I swap
>>> constexpr for const and see if it works.
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>> On Aug 30, 2021, at 14:14, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> That did not seem to work.
>>>
>>> 15:09 main<> /gpfs/alpine/csc314/scratch/adams/petsc2$ mpicc --version
>>> gcc (GCC) 9.1.0
>>> Copyright (C) 2019 Free Software Foundation, Inc.
>>> This is free software; see the source for copying conditions. There is
>>> NO
>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>>> PURPOSE.
>>>
>>> I have read "GCC 8.x (and later) fully supports all of C++17."
>>>
>>>
>>>
>>> On Mon, Aug 30, 2021 at 3:07 PM Jacob Faibussowitsch <
>>> jacob.fai at gmail.com> wrote:
>>>
>>>> Yeah I suppose so, all the values we alias are integral types so static
>>>> const should have equivalent compile-time assurance as constexpr.
>>>>
>>>> Best regards,
>>>>
>>>> Jacob Faibussowitsch
>>>> (Jacob Fai - booss - oh - vitch)
>>>>
>>>> On Aug 30, 2021, at 13:44, Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>> Can you use less fancy 'static const int'?
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Mon, Aug 30, 2021 at 1:02 PM Jacob Faibussowitsch <
>>>> jacob.fai at gmail.com> wrote:
>>>>
>>>>> No luck with C++14
>>>>>
>>>>>
>>>>> TL;DR: you need to have host and device compiler either both using
>>>>> c++17 or neither using c++17.
>>>>>
>>>>> Long version:
>>>>> C++17 among other things changed how static constexpr member variables
>>>>> for classes worked. Previously if I had a class with a static constexpr
>>>>> member variable I would have to not only declare it inline within the
>>>>> class, but also define it within an executable otherwise the variable would
>>>>> not actually have any physical memory address:
>>>>>
>>>>> // foo.hpp
>>>>> class foo
>>>>> {
>>>>> static constexpr int bar = 5;
>>>>> };
>>>>>
>>>>> // foo.cpp
>>>>> int foo::bar;
>>>>>
>>>>> In c++17 however this changed because you can have static “inline”
>>>>> variables. All this does is force the compiler define the variable for you
>>>>> instead. The issue of course is that static constexpr implicitly makes that
>>>>> variable inline in c++17. So to sum it up:
>>>>>
>>>>> 1. The c++17 compiler (nvcc) sees the static constexpr variable, goes
>>>>> “hmm ok I will define this in some undefined location”.
>>>>> 2. The c++11/14 compiler comes along, sees your hand-coded definition
>>>>> of the variable and goes “ah but I think I’ve seen this before, I’ll ignore
>>>>> it”. This silent rejection is due to the hand-coded definition idiom being
>>>>> deprecated from c++17 onwards. Stupid, I know.
>>>>> 2. The linker (driven by the c++11/14 compiler since PETSc links using
>>>>> the host compiler) comes along and now suddenly cannot find the literal
>>>>> definition, because it doesn’t know what the c++17 did. Disaster!
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Jacob Faibussowitsch
>>>>> (Jacob Fai - booss - oh - vitch)
>>>>>
>>>>> On Aug 30, 2021, at 10:12, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>> No luck with C++14
>>>>>
>>>>> CUDAC
>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>> CUDAC.dep
>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>> CLINKER arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3
>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/impls/cupm/cuda/cupmcontext.o:(.rodata._ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE[_ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE]+0x0):
>>>>> multiple definition of
>>>>> `Petsc::CUPMInterface<(Petsc::CUPMDeviceKind)0>::cupmStreamNonBlocking'
>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/interface/cupminterface.o:(.rodata+0x44):
>>>>> first defined here
>>>>> /usr/bin/ld: link errors found, deleting executable
>>>>> `arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3'
>>>>> collect2: error: ld returned 1 exit status
>>>>> gmake[3]: *** [gmakefile:113:
>>>>> arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3] Error 1
>>>>> gmake[2]: ***
>>>>> [/gpfs/alpine/csc314/scratch/adams/petsc2/lib/petsc/conf/rules:50: libs]
>>>>> Error 2
>>>>> **************************ERROR*************************************
>>>>> Error during compile, check
>>>>> arch-summit-hypre-cuda-dbg/lib/petsc/conf/make.log
>>>>> Send it and arch-summit-hypre-cuda-dbg/lib/petsc/conf/configure.log
>>>>> to petsc-maint at mcs.anl.gov
>>>>> ********************************************************************
>>>>> gmake[1]: *** [makefile:40: all] Error 1
>>>>>
>>>>> On Mon, Aug 30, 2021 at 10:50 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> Stefano suggested C++14 in configure. I was using C++11.
>>>>>>
>>>>>> On Mon, Aug 30, 2021 at 10:46 AM Junchao Zhang <
>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>
>>>>>>> Petsc::CUPMInterface
>>>>>>> @Jacob Faibussowitsch <jacob.fai at gmail.com>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 30, 2021 at 9:35 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> I was running fine this AM and am bouncing between modules to help
>>>>>>>> two apps (ECP milestone season) at the same time and something broke. I did
>>>>>>>> update main and I get the same error in main and a hypre branch of
>>>>>>>> Stefano's.
>>>>>>>> I started with a clean build and checked my modules...
>>>>>>>> Any ideas?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> CC arch-summit-hypre-cuda-dbg/obj/tao/interface/taosolver.o
>>>>>>>> CC arch-summit-hypre-cuda-dbg/obj/ts/interface/ts.o
>>>>>>>> CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>>>>>>>> CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/dense/seq/cuda/densecuda.o
>>>>>>>> CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>>>>>>>> CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparseband.o
>>>>>>>> CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/ts/utils/dmplexlandau/cuda/landaucu.o
>>>>>>>> CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/ts/utils/dmplexlandau/cuda/landaucu.o
>>>>>>>> CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/vec/impls/seq/seqcuda/veccuda2.o
>>>>>>>> CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/vec/impls/seq/seqcuda/veccuda2.o
>>>>>>>> CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.o
>>>>>>>> CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.o
>>>>>>>> CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>>>>>>>> CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>>>>>>>> CUDAC
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>>>>> CUDAC.dep
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/vec/is/sf/impls/basic/cuda/sfcuda.o
>>>>>>>> CLINKER arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/impls/cupm/cuda/cupmcontext.o:(.rodata._ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE[_ZN5Petsc13CUPMInterfaceILNS_14CUPMDeviceKindE0EE21cupmStreamNonBlockingE]+0x0):
>>>>>>>> multiple definition of
>>>>>>>> `Petsc::CUPMInterface<(Petsc::CUPMDeviceKind)0>::cupmStreamNonBlocking'
>>>>>>>> arch-summit-hypre-cuda-dbg/obj/sys/objects/device/interface/cupminterface.o:(.rodata+0x44):
>>>>>>>> first defined here
>>>>>>>> /usr/bin/ld: link errors found, deleting executable
>>>>>>>> `arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3'
>>>>>>>> collect2: error: ld returned 1 exit status
>>>>>>>> gmake[3]: *** [gmakefile:113:
>>>>>>>> arch-summit-hypre-cuda-dbg/lib/libpetsc.so.3.015.3] Error 1
>>>>>>>> gmake[2]: ***
>>>>>>>> [/gpfs/alpine/csc314/scratch/adams/petsc2/lib/petsc/conf/rules:50: libs]
>>>>>>>> Error 2
>>>>>>>> **************************ERROR*************************************
>>>>>>>> Error during compile, check
>>>>>>>> arch-summit-hypre-cuda-dbg/lib/petsc/conf/make.log
>>>>>>>> Send it and
>>>>>>>> arch-summit-hypre-cuda-dbg/lib/petsc/conf/configure.log to
>>>>>>>> petsc-maint at mcs.anl.gov
>>>>>>>> ********************************************************************
>>>>>>>> gmake[1]: *** [makefile:40: all] Error 1
>>>>>>>> make: *** [GNUmakefile:9: all] Error 2
>>>>>>>>
>>>>>>>
>>>>>
>>>> <make.log><configure.log>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210830/b1539c01/attachment-0001.html>
More information about the petsc-dev
mailing list