[petsc-dev] VecScatter scaling problem on KNL

Mark Adams mfadams at lbl.gov
Wed Mar 8 18:36:25 CST 2017


On Wed, Mar 8, 2017 at 7:32 PM, Tuomas Koskela <tkoskela at lbl.gov> wrote:
> This is with PETSc 3.6.3. Are there new features in 3.7 that could help? The
> API in 3.7 had some changes that would require updating the code.

Well, the new PETSc that I build is recent so you will need to make
those changes.

>
> -Tuomas
>
>
>
> On 3/8/17 16:29, Barry Smith wrote:
>>
>>    Mark,
>>
>>     Are you getting this with PETSc 3.7.5 ?  Is the code valgrinded?
>>
>>
>>> On Mar 8, 2017, at 6:27 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> On Wed, Mar 8, 2017 at 4:57 PM, Richard Mills <richardtmills at gmail.com>
>>> wrote:
>>>>
>>>> Hi Mark,
>>>>
>>>> Is your application threaded?  I seem to recall having seen these
>>>> "Logging
>>>> event had unbalanced begin/end pairs" with threaded codes that call
>>>> PETSc.
>>>
>>> It is OMP threaded, but it should certainly not call PETSc inside of a
>>> thread loop... but this does look like something that threading could
>>> cause.
>>>
>>>
>>>> --Richard
>>>>
>>>> On Wed, Mar 8, 2017 at 1:33 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>> Our code is having scaling problems on KNL (Cori), when we get up to
>>>>> about 1K sockets.
>>>>>
>>>>> We have isolated the problem to a certain VecScatter. This code stores
>>>>> the data redundantly. Scattering into the solver is just a local copy,
>>>>> but scattering out requires that each process send all of its data to
>>>>> every other process. It is this second one that is not scaling well.
>>>>>
>>>>> I wish I had more data, but this is urgent, jobs are in the queue, but
>>>>> this is all I have. Any recommendation for parameters that we might
>>>>> test while we get more data?
>>>>>
>>>>> Also, we got this error with -log_view.
>>>>>
>>>>> I've updated their PETSc with maint and we are waiting for it to run
>>>>> again. Apparently this was not on the first time step, so the code
>>>>> seems to have run for a while with what looks to me like a logic bug.
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>> [4098]PETSC ERROR: --------------------- Error Message
>>>>> --------------------------------------------------------------
>>>>> [4098]PETSC ERROR: Object is in wrong state
>>>>> [4098]PETSC ERROR: Logging event had unbalanced begin/end pairs
>>>>> [4098]PETSC ERROR: See
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>>>> shooting.
>>>>> [4098]PETSC ERROR: Petsc Release Version 3.6.3, unknown
>>>>> [4098]PETSC ERROR: /global/cscratch1/sd/worleyph/XGC1_KNL/xgc2 on a
>>>>> v3.6.3-arch-knl-opt64-intel named nid05668 by worleyph Mon Mar  6
>>>>> 11:33:19 2017
>>>>> [4098]PETSC ERROR: Configure options COPTFLAGS="-g -O3 -fp-model fast
>>>>> -xMIC-AVX512 -DX2_HAVE_INTEL" CXXOPTFLAGS="-g -O3 -fp-model fast
>>>>> -xMIC-AVX512\
>>>>> -DX2_HAVE_INTEL" FOPTFLAGS="-g -O3 -fp-model fast -xMIC-AVX512
>>>>> -DX2_HAVE_INTEL" --download-metis=1 --download-parmetis=1
>>>>> --with-blas-lapack-dir=/g\
>>>>>
>>>>>
>>>>> lobal/common/cori/software/intel/compilers_and_libraries_2017.0.098/linux/mkl
>>>>> --with-cc=cc --with-cxx=cc --with-debugging=0 --with-fc=ftn --with-mp\
>>>>> iexec=srun --with-batch=0 --with-memalign=64 --with-64-bit-indices
>>>>> --known-mpi-shared-libraries=1 PETSC_ARCH=v3.6.3-arch-knl-opt64-intel
>>>>> --with-ope\
>>>>> nmp=1 PETSC_DIR=/global/homes/t/tkoskela/git/petsc
>>>>> [4098]PETSC ERROR: #1 PetscLogEventEndDefault() line 696 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/sys/logging/utils/eventlog.c
>>>>> [4098]PETSC ERROR: #2 VecSet() line 577 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/rvector.c
>>>>> [4098]PETSC ERROR: #3 VecCreate_Seq() line 44 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec3.c
>>>>> [4098]PETSC ERROR: #4 VecSetType() line 53 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vecreg.c
>>>>> [4098]PETSC ERROR: #5 VecDuplicate_Seq() line 786 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec2.c
>>>>> [4098]PETSC ERROR: #6 VecDuplicate() line 399 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
>>>>> [4098]PETSC ERROR: #7 VecDuplicateVecs_Default() line 840 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
>>>>> [4098]PETSC ERROR: #8 VecDuplicateVecs() line 473 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
>>>>
>>>>
>



More information about the petsc-dev mailing list