[petsc-dev] error with --download-amgx

Smith, Barry F. bsmith at mcs.anl.gov
Tue Dec 3 22:10:33 CST 2019


  Ok, since we are using a fork of the repository if it is causing grief we can remove it but otherwise best just to leave it since it put in by the AMGX developers.

   

> On Dec 3, 2019, at 10:00 PM, Balay, Satish <balay at mcs.anl.gov> wrote:
> 
> Its from the attached configure.log [from a prior e-mail on this
> thread] - and this flag is not passed in from petsc configure to amgx
> cmake - so it must be somehow set internally in this package.
> 
> Satish
> 
> On Wed, 4 Dec 2019, Smith, Barry F. wrote:
> 
>> 
>>> Also - its best to avoid -Werror in externalpackage builds..
>> 
>>  Hmm, my branch uses the CMake package to do the install  (it has no custom code) so is this a bug in package.py that it doesn't strip out the -Werror when passing stuff to cmake?
>> 
>>  There is not enough information your email for me to determine where this message you printed below came from. Did it come from my branch?
>> 
>>   Barry
>> 
>> 
>>> On Dec 3, 2019, at 9:29 PM, Balay, Satish <balay at mcs.anl.gov> wrote:
>>> 
>>>>>>> 
>>> autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/cmake-3.15.2-xit2o3iepxvqbyku77lwcugufilztu7t/bin/cmake -E remove /autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git.amgx/petsc-build/core/CMakeFiles/amgx_core.dir/src/classical/interpolators/./amgx_core_generated_common.cu.o
>>> /sw/summit/cuda/10.1.168/bin/nvcc -M -D__CUDACC__ /autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git.amgx/plugin_config.cu -o /autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git.amgx/petsc-build/base/CMakeFiles/amgx_base.dir/__/amgx_base_generated_plugin_config.cu.o.NVCC-depend -m64 -Xcompiler ,\"-fstack-protector\",\"-g\",\"-O2\",\"-fPIC\",\"-Wno-terminate\",\"-static-libgcc\",\"-fopenmp\",\"-DRAPIDJSON_DEFINED\",\"-DAMGX_WITH_MPI\",\"-fstack-protector\",\"-g\",\"-O2\",\"-fPIC\" -ccbin mpicxx -gencode=arch=compute_35,code=\"sm_35,compute_35\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" -gencode=arch=compute_60,code=\"sm_60,compute_60\" -gencode=arch=compute_70,code=\"sm_70,compute_70\" -O3 -DNDEBUG -std=c++11 --Werror cross-execution-space-call -DNVCC -I/autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git.amgx/../../thrust -I/autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git..amgx/base/include -I/sw/summit/cuda/10.1.168/include -I/autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git.amgx/external/rapidjson/include
>>> -- Removing /autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git.amgx/petsc-build/base/CMakeFiles/amgx_base.dir/src/./amgx_base_generated_device_properties.cu.o
>>> <<<<<<
>>> 
>>> Also - its best to avoid -Werror in externalpackage builds..
>>> 
>>> Satish
>>> 
>>> On Tue, 3 Dec 2019, Mark Adams wrote:
>>> 
>>>> Barry,
>>>> 
>>>> First, there is a fix that we need in the AMGx repo (appended).
>>>> 
>>>> I've never had problems like this not being able to build. I will try to
>>>> build my AMGx manually to check.
>>>> 
>>>> 21:31 master *= ~/AMGX$ git diff
>>>> diff --git a/base/include/amgx_c_wrappers.inl
>>>> b/base/include/amgx_c_wrappers.inl
>>>> index 42496f0..ed9f0e1 100644
>>>> --- a/base/include/amgx_c_wrappers.inl
>>>> +++ b/base/include/amgx_c_wrappers.inl
>>>> @@ -643,7 +643,7 @@ inline AMGX_Mode get_mode_from(const Envelope &envl)
>>>>    {
>>>>        //throws...
>>>>        //
>>>> -        FatalError("Mode not found.\n", AMGX_ERR_BAD_MODE);
>>>> +        // FatalError("Mode not found.\n", AMGX_ERR_BAD_MODE);
>>>>    }
>>>> 
>>>>    AMGX_Mode mode = static_cast<AMGX_Mode>(itFound->second);
>>>> @@ -1125,4 +1125,4 @@ inline bool remove_managed_matrix(AMGX_matrix_handle
>>>> envl)
>>>> } //namespace unnamed
>>>> 
>>>> On Tue, Dec 3, 2019 at 8:50 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>> 
>>>>> The first error is
>>>>> 
>>>>> nvcc error   : 'cicc' died due to signal 9 (Kill signal)
>>>>> nvcc error   : 'cicc' died due to signal 9 (Kill signal)
>>>>> 
>>>>> later
>>>>> 
>>>>> 
>>>>> /autofs/nccs-svm1_home1/adams/petsc/arch-summit-opt64-gnu-cuda/externalpackages/git.amgx/base/src/
>>>>> amgx_c_common.cu(77): catastrophic error: error while writing generated
>>>>> C++ file: Cannot allocate memory
>>>>> 
>>>>> 1 catastrophic error detected in the compilation of
>>>>> "/tmp/tmpxft_000038d3_00000000-4_amgx_c_common.cpp4.ii".
>>>>> 
>>>>> Whatever machine you are compiling on seems to be overloaded. What does
>>>>> top show? Can you log into "different" compiler servers on Summit?
>>>>> 
>>>>> I run the install on the utk system with no problems like this.
>>>>> 
>>>>> PETSc configure is deciding
>>>>> 
>>>>> TEST configureMakeNP from
>>>>> config.packages.make(/autofs/nccs-svm1_home1/adams/petsc/config/BuildSystem/config/packages/make.py:168)
>>>>> TESTING: configureMakeNP from
>>>>> config.packages.make(config/BuildSystem/config/packages/make.py:168)
>>>>> check no of cores on the build machine [perhaps to do make '-j ncores']
>>>>>         module multiprocessing found 128 cores: using make_np = 59
>>>>>           Defined make macro "MAKE_NP" to "59"
>>>>>           Defined make macro "MAKE_TEST_NP" to "49"
>>>>>           Defined make macro "MAKE_LOAD" to "166.4"
>>>>>           Defined make macro "NPMAX" to "128"
>>>>> 
>>>>> then somehow this gets passed down to the AMGX cmake. Perhaps you could
>>>>> try less greedy numbers.
>>>>> 
>>>>> Executing: /usr/bin/gmake -j59 -l166.4
>>>>> 
>>>>> Perhaps try
>>>>> 
>>>>> --with-make-np=20
>>>>> 
>>>>> --with-make-load=20
>>>>> 
>>>>> Barry
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Dec 3, 2019, at 4:17 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>> 
>>>>>> That might have been from an old .nsf file. Here is a new one. I'm sure
>>>>> I have the disk space.
>>>>>> 
>>>>>> On Tue, Dec 3, 2019 at 4:51 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>> Humm, clean up and seem to have different error:
>>>>>> 
>>>>>> 
>>>>>> On Tue, Dec 3, 2019 at 3:45 PM Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>> Could you have run out of disk space?
>>>>>> 
>>>>>> /usr/bin/ranlib: libamgx_base.a: Input/output error
>>>>>> gmake[2]: *** [base/libamgx_base.a] Error 1
>>>>>> gmake[2]: *** Deleting file `base/libamgx_base.a'
>>>>>> gmake[1]: *** [base/CMakeFiles/amgx_base.dir/all] Error 2
>>>>>> gmake: *** [all] Error 2
>>>>>> 
>>>>>>  Matt
>>>>>> 
>>>>>> On Tue, Dec 3, 2019 at 1:44 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>> 
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <configure.log>
>>>>> 
>>>>> 
>>>> 
>> 



More information about the petsc-dev mailing list