[petsc-dev] VecScatter scaling problem on KNL
Barry Smith
bsmith at mcs.anl.gov
Wed Mar 8 18:43:47 CST 2017
> On Mar 8, 2017, at 6:32 PM, Tuomas Koskela <tkoskela at lbl.gov> wrote:
>
> This is with PETSc 3.6.3. Are there new features in 3.7 that could help?
In theory there are not any changes in 3.7 that would effect the performance of the VecScatter nor affect this "weird bug" with the logging.
Does the logging problem happen only on the KNL? Does it happen for small problems on 1 or 2 cores or only on big runs?
What is the rest of the stack above the line below?
>>>>> 4098]PETSC ERROR: #8 VecDuplicateVecs() line 473 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
If you build without all the optimization flags do you still get this logging error?
Barry
> The API in 3.7 had some changes that would require updating the code.
>
> -Tuomas
>
>
> On 3/8/17 16:29, Barry Smith wrote:
>> Mark,
>>
>> Are you getting this with PETSc 3.7.5 ? Is the code valgrinded?
>>
>>
>>> On Mar 8, 2017, at 6:27 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> On Wed, Mar 8, 2017 at 4:57 PM, Richard Mills <richardtmills at gmail.com> wrote:
>>>> Hi Mark,
>>>>
>>>> Is your application threaded? I seem to recall having seen these "Logging
>>>> event had unbalanced begin/end pairs" with threaded codes that call PETSc.
>>> It is OMP threaded, but it should certainly not call PETSc inside of a
>>> thread loop... but this does look like something that threading could
>>> cause.
>>>
>>>
>>>> --Richard
>>>>
>>>> On Wed, Mar 8, 2017 at 1:33 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>> Our code is having scaling problems on KNL (Cori), when we get up to
>>>>> about 1K sockets.
>>>>>
>>>>> We have isolated the problem to a certain VecScatter. This code stores
>>>>> the data redundantly. Scattering into the solver is just a local copy,
>>>>> but scattering out requires that each process send all of its data to
>>>>> every other process. It is this second one that is not scaling well.
>>>>>
>>>>> I wish I had more data, but this is urgent, jobs are in the queue, but
>>>>> this is all I have. Any recommendation for parameters that we might
>>>>> test while we get more data?
>>>>>
>>>>> Also, we got this error with -log_view.
>>>>>
>>>>> I've updated their PETSc with maint and we are waiting for it to run
>>>>> again. Apparently this was not on the first time step, so the code
>>>>> seems to have run for a while with what looks to me like a logic bug.
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>> [4098]PETSC ERROR: --------------------- Error Message
>>>>> --------------------------------------------------------------
>>>>> [4098]PETSC ERROR: Object is in wrong state
>>>>> [4098]PETSC ERROR: Logging event had unbalanced begin/end pairs
>>>>> [4098]PETSC ERROR: See
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>>>> shooting.
>>>>> [4098]PETSC ERROR: Petsc Release Version 3.6.3, unknown
>>>>> [4098]PETSC ERROR: /global/cscratch1/sd/worleyph/XGC1_KNL/xgc2 on a
>>>>> v3.6.3-arch-knl-opt64-intel named nid05668 by worleyph Mon Mar 6
>>>>> 11:33:19 2017
>>>>> [4098]PETSC ERROR: Configure options COPTFLAGS="-g -O3 -fp-model fast
>>>>> -xMIC-AVX512 -DX2_HAVE_INTEL" CXXOPTFLAGS="-g -O3 -fp-model fast
>>>>> -xMIC-AVX512\
>>>>> -DX2_HAVE_INTEL" FOPTFLAGS="-g -O3 -fp-model fast -xMIC-AVX512
>>>>> -DX2_HAVE_INTEL" --download-metis=1 --download-parmetis=1
>>>>> --with-blas-lapack-dir=/g\
>>>>>
>>>>> lobal/common/cori/software/intel/compilers_and_libraries_2017.0.098/linux/mkl
>>>>> --with-cc=cc --with-cxx=cc --with-debugging=0 --with-fc=ftn --with-mp\
>>>>> iexec=srun --with-batch=0 --with-memalign=64 --with-64-bit-indices
>>>>> --known-mpi-shared-libraries=1 PETSC_ARCH=v3.6.3-arch-knl-opt64-intel
>>>>> --with-ope\
>>>>> nmp=1 PETSC_DIR=/global/homes/t/tkoskela/git/petsc
>>>>> [4098]PETSC ERROR: #1 PetscLogEventEndDefault() line 696 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/sys/logging/utils/eventlog.c
>>>>> [4098]PETSC ERROR: #2 VecSet() line 577 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/rvector.c
>>>>> [4098]PETSC ERROR: #3 VecCreate_Seq() line 44 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec3.c
>>>>> [4098]PETSC ERROR: #4 VecSetType() line 53 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vecreg.c
>>>>> [4098]PETSC ERROR: #5 VecDuplicate_Seq() line 786 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec2.c
>>>>> [4098]PETSC ERROR: #6 VecDuplicate() line 399 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
>>>>> [4098]PETSC ERROR: #7 VecDuplicateVecs_Default() line 840 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
>>>>> [4098]PETSC ERROR: #8 VecDuplicateVecs() line 473 in
>>>>> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
>>>>
>
More information about the petsc-dev
mailing list