[petsc-dev] VecScatter scaling problem on KNL

Barry Smith bsmith at mcs.anl.gov
Thu Mar 9 16:10:39 CST 2017


> On Mar 8, 2017, at 3:46 PM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Wed, Mar 8, 2017 at 3:33 PM, Mark Adams <mfadams at lbl.gov> wrote:
> Our code is having scaling problems on KNL (Cori), when we get up to
> about 1K sockets.
> 
> We have isolated the problem to a certain VecScatter. This code stores
> the data redundantly. Scattering into the solver is just a local copy,
> but scattering out requires that each process send all of its data to
> every other process. It is this second one that is not scaling well.
> 
> This is an all2all, so it should be hard to beat
> 
> -vecscatter_alltoall

  Mark,

     Matt is right,

     You should definitely try this before writing additional code. But you need to put it in the code so it affects just this one scatter, not all the scatters. So in the place where you create this "all to all" vector scatter do the following.

    PetscOptionsSetValue(NULL,"-vecscatter_alltoall","true");
    VecScatterCreate...
    PetscOptionsClearValue(NULL,"-vecscatter_alltoall")

   You need to possibly change it slightly for different PETSc versions or Fortran.

    Please let us know how it goes,

    Barry


  
> 
> The complete list of tuning options is here:
> 
>   http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html
>  
> I wish I had more data, but this is urgent, jobs are in the queue, but
> this is all I have. Any recommendation for parameters that we might
> test while we get more data?
> 
> Also, we got this error with -log_view.
> 
> This looks like a bug in the code, but its hard to know exactly what. However, it should give
> the same error even for a very small problem. Can you run something small and get it?
> 
>   Thanks,
> 
>      Matt
>  
> I've updated their PETSc with maint and we are waiting for it to run
> again. Apparently this was not on the first time step, so the code
> seems to have run for a while with what looks to me like a logic bug.
> 
> Thanks,
> Mark
> 
> 
> [4098]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [4098]PETSC ERROR: Object is in wrong state
> [4098]PETSC ERROR: Logging event had unbalanced begin/end pairs
> [4098]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
> shooting.
> [4098]PETSC ERROR: Petsc Release Version 3.6.3, unknown
> [4098]PETSC ERROR: /global/cscratch1/sd/worleyph/XGC1_KNL/xgc2 on a
> v3.6.3-arch-knl-opt64-intel named nid05668 by worleyph Mon Mar  6
> 11:33:19 2017
> [4098]PETSC ERROR: Configure options COPTFLAGS="-g -O3 -fp-model fast
> -xMIC-AVX512 -DX2_HAVE_INTEL" CXXOPTFLAGS="-g -O3 -fp-model fast
> -xMIC-AVX512\
>  -DX2_HAVE_INTEL" FOPTFLAGS="-g -O3 -fp-model fast -xMIC-AVX512
> -DX2_HAVE_INTEL" --download-metis=1 --download-parmetis=1
> --with-blas-lapack-dir=/g\
> lobal/common/cori/software/intel/compilers_and_libraries_2017.0.098/linux/mkl
> --with-cc=cc --with-cxx=cc --with-debugging=0 --with-fc=ftn --with-mp\
> iexec=srun --with-batch=0 --with-memalign=64 --with-64-bit-indices
> --known-mpi-shared-libraries=1 PETSC_ARCH=v3.6.3-arch-knl-opt64-intel
> --with-ope\
> nmp=1 PETSC_DIR=/global/homes/t/tkoskela/git/petsc
> [4098]PETSC ERROR: #1 PetscLogEventEndDefault() line 696 in
> /global/u2/t/tkoskela/git/petsc/src/sys/logging/utils/eventlog.c
> [4098]PETSC ERROR: #2 VecSet() line 577 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/rvector.c
> [4098]PETSC ERROR: #3 VecCreate_Seq() line 44 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec3.c
> [4098]PETSC ERROR: #4 VecSetType() line 53 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vecreg.c
> [4098]PETSC ERROR: #5 VecDuplicate_Seq() line 786 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/impls/seq/bvec2.c
> [4098]PETSC ERROR: #6 VecDuplicate() line 399 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
> [4098]PETSC ERROR: #7 VecDuplicateVecs_Default() line 840 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
> [4098]PETSC ERROR: #8 VecDuplicateVecs() line 473 in
> /global/u2/t/tkoskela/git/petsc/src/vec/vec/interface/vector.c
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener




More information about the petsc-dev mailing list