[petsc-dev] VecScatter scaling problem on KNL

Mark Adams mfadams at lbl.gov
Wed Mar 8 15:33:21 CST 2017

Our code is having scaling problems on KNL (Cori), when we get up to
about 1K sockets.

We have isolated the problem to a certain VecScatter. This code stores
the data redundantly. Scattering into the solver is just a local copy,
but scattering out requires that each process send all of its data to
every other process. It is this second one that is not scaling well.

I wish I had more data, but this is urgent, jobs are in the queue, but
this is all I have. Any recommendation for parameters that we might
test while we get more data?

Also, we got this error with -log_view.

I've updated their PETSc with maint and we are waiting for it to run
again. Apparently this was not on the first time step, so the code
seems to have run for a while with what looks to me like a logic bug.


[4098]PETSC ERROR: --------------------- Error Message
[4098]PETSC ERROR: Object is in wrong state
[4098]PETSC ERROR: Logging event had unbalanced begin/end pairs
[4098]PETSC ERROR: See
http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
[4098]PETSC ERROR: Petsc Release Version 3.6.3, unknown
[4098]PETSC ERROR: /global/cscratch1/sd/worleyph/XGC1_KNL/xgc2 on a
v3.6.3-arch-knl-opt64-intel named nid05668 by worleyph Mon Mar  6
11:33:19 2017
[4098]PETSC ERROR: Configure options COPTFLAGS="-g -O3 -fp-model fast
-xMIC-AVX512 -DX2_HAVE_INTEL" CXXOPTFLAGS="-g -O3 -fp-model fast
 -DX2_HAVE_INTEL" FOPTFLAGS="-g -O3 -fp-model fast -xMIC-AVX512
-DX2_HAVE_INTEL" --download-metis=1 --download-parmetis=1
--with-cc=cc --with-cxx=cc --with-debugging=0 --with-fc=ftn --with-mp\
iexec=srun --with-batch=0 --with-memalign=64 --with-64-bit-indices
--known-mpi-shared-libraries=1 PETSC_ARCH=v3.6.3-arch-knl-opt64-intel
nmp=1 PETSC_DIR=/global/homes/t/tkoskela/git/petsc
[4098]PETSC ERROR: #1 PetscLogEventEndDefault() line 696 in
[4098]PETSC ERROR: #2 VecSet() line 577 in
[4098]PETSC ERROR: #3 VecCreate_Seq() line 44 in
[4098]PETSC ERROR: #4 VecSetType() line 53 in
[4098]PETSC ERROR: #5 VecDuplicate_Seq() line 786 in
[4098]PETSC ERROR: #6 VecDuplicate() line 399 in
[4098]PETSC ERROR: #7 VecDuplicateVecs_Default() line 840 in
[4098]PETSC ERROR: #8 VecDuplicateVecs() line 473 in

