[petsc-dev] VecScatter scaling problem on KNL

Matthew Knepley knepley at gmail.com
Wed Mar 8 15:46:35 CST 2017

On Wed, Mar 8, 2017 at 3:33 PM, Mark Adams <mfadams at lbl.gov> wrote:

> Our code is having scaling problems on KNL (Cori), when we get up to
> about 1K sockets.
> We have isolated the problem to a certain VecScatter. This code stores
> the data redundantly. Scattering into the solver is just a local copy,
> but scattering out requires that each process send all of its data to
> every other process. It is this second one that is not scaling well.

This is an all2all, so it should be hard to beat


The complete list of tuning options is here:


> I wish I had more data, but this is urgent, jobs are in the queue, but
> this is all I have. Any recommendation for parameters that we might
> test while we get more data?
> Also, we got this error with -log_view.

This looks like a bug in the code, but its hard to know exactly what.
However, it should give
the same error even for a very small problem. Can you run something small
and get it?



> I've updated their PETSc with maint and we are waiting for it to run
> again. Apparently this was not on the first time step, so the code
> seems to have run for a while with what looks to me like a logic bug.
> Thanks,
> Mark
> [4098]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [4098]PETSC ERROR: Object is in wrong state
> [4098]PETSC ERROR: Logging event had unbalanced begin/end pairs
> [4098]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
> shooting.
> [4098]PETSC ERROR: Petsc Release Version 3.6.3, unknown
> [4098]PETSC ERROR: /global/cscratch1/sd/worleyph/XGC1_KNL/xgc2 on a
> v3.6.3-arch-knl-opt64-intel named nid05668 by worleyph Mon Mar  6
> 11:33:19 2017
> [4098]PETSC ERROR: Configure options COPTFLAGS="-g -O3 -fp-model fast
> -xMIC-AVX512 -DX2_HAVE_INTEL" CXXOPTFLAGS="-g -O3 -fp-model fast
> -xMIC-AVX512\
>  -DX2_HAVE_INTEL" FOPTFLAGS="-g -O3 -fp-model fast -xMIC-AVX512
> -DX2_HAVE_INTEL" --download-metis=1 --download-parmetis=1
> --with-blas-lapack-dir=/g\
> lobal/common/cori/software/intel/compilers_and_libraries_
> 2017.0.098/linux/mkl
> --with-cc=cc --with-cxx=cc --with-debugging=0 --with-fc=ftn --with-mp\
> iexec=srun --with-batch=0 --with-memalign=64 --with-64-bit-indices
> --known-mpi-shared-libraries=1 PETSC_ARCH=v3.6.3-arch-knl-opt64-intel
> --with-ope\
> nmp=1 PETSC_DIR=/global/homes/t/tkoskela/git/petsc
> [4098]PETSC ERROR: #1 PetscLogEventEndDefault() line 696 in
> /global/u2/t/tkoskela/git/petsc/src/sys/logging/utils/eventlog.c
> [4098]PETSC ERROR: #2 VecSet() line 577 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/rvector.c
> [4098]PETSC ERROR: #3 VecCreate_Seq() line 44 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec3.c
> [4098]PETSC ERROR: #4 VecSetType() line 53 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vecreg.c
> [4098]PETSC ERROR: #5 VecDuplicate_Seq() line 786 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec2.c
> [4098]PETSC ERROR: #6 VecDuplicate() line 399 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
> [4098]PETSC ERROR: #7 VecDuplicateVecs_Default() line 840 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
> [4098]PETSC ERROR: #8 VecDuplicateVecs() line 473 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170308/05edfa65/attachment.html>

More information about the petsc-dev mailing list