[petsc-dev] VecScatterBegin_1 with zero sized vectors and PETSC_HAVE_CUSP

Matthew Knepley knepley at gmail.com
Fri Jan 20 17:16:28 CST 2012


On Fri, Jan 20, 2012 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On Jan 20, 2012, at 4:58 PM, Stefano Zampini wrote:
>
> > Thank you, I'll let you know if it crashes again. Anyways, the problem
> is that xin->map->n (vpscat.h actual line 58) is zero for some of my
> vectors, and thus it will enter the if block even if I don't need to do
> anything with CUSP. Is it really important the first logic of the OR?
>
>    The block is suppose handle the 0 case just fine, if it does not handle
> the 0 case then that is a bug either in PETSc or CUSP and needs to be
> fixed. Having 0 handled by the if is crucial to get any kind of performance
> otherwise it will always copy the entire vector from the GPU to the CPU for
> absolutely no reason.
>
> >
> > Recompile will take a while since my petsc_arch on gpu cluster is not
> able to use cmake to build ( I saw missing files in CMakeLists.txt for CUSP
> and GPU related stuffs). Is it a known issue? Is there a way to simply
> recompile the changed code only?
>
>    I think this is because the cmake developers do not yet support the
> cuda compiler nvcc. Bitch to them. cmake is the way in PETSc to get partial
> recompiles.


Or use the Python make, which handles nvcc, and bitch to me about all its
other problems.

   Matt


>
>   Barry
>
> >
> > Stefano
> >
> >
> > 2012/1/20 Barry Smith <bsmith at mcs.anl.gov>
> >
> > On Jan 20, 2012, at 2:32 PM, Jed Brown wrote:
> >
> > > On Fri, Jan 20, 2012 at 14:27, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >   I do not understand the error traceback. It should NOT look like
> this. Is that really the exact output from a single failed run? There
> snould not be multiple messages of ----Error Message ---- etc. It shoul
> immediately after the first listing of Configure options show the complete
> stack where the problem happened instead it printed an initial error
> message again and then again and then finally a stack. This is not suppose
> to be possible.
> > >
> > > That's the kind of thing that happens if the error is raised on
> COMM_SELF.
> >
> >    ???? I don't think so. Note the entire error set comes from process
> 17, even with COMM_SELF it is not suppose to print the error message stuff
> multiple times on the same MPI node.
> >
> > > Also, is this really supposed to use CHKERRCUSP()?
> >
> >   No, that is wrong, I fixed it but then had a nasty merge with Paul's
> updates to PETSc GPU stuff.  I don't think that caused the grief.
> >
> >   Stefano,
> >
> >      Anyways since Paul updated all the cusp stuff please hg pull; hg
> update and rebuild the PETSc library then try again if still problems again
> send the entire output on error.
> >
> >     If similar thing happens I'm tempted to ask you to run node 17 in
> the debugger and see why the error message comes up multiple times.
> -start_in_debugger -debugger_nodes 17
> >
> >
> >    Barry
> >
> >
> > > The function uses normal CHKERRQ() inside.
> > >
> > > PetscErrorCode VecCUSPCopyFromGPUSome_Public(Vec v, PetscCUSPIndices
> ci)
> > > {
> > >   PetscErrorCode ierr;
> > >
> > >   PetscFunctionBegin;
> > >   ierr =
> VecCUSPCopyFromGPUSome(v,&ci->indicesCPU,&ci->indicesGPU);CHKERRCUSP(ierr);
> > >   PetscFunctionReturn(0);
> > > }
> > >
> > >
> > > Are you running with multiple threads AND gpus? That won't work.
> > >
> > >   Anyways I cannot find anywhere a list of Cusp error messages that
> include the numbers 46 and 76; why are not the except messages strings ???
> > >
> > >
> > >   Barry
> > >
> > >
> > > [17]PETSC ERROR: VecCUSPAllocateCheck() line 77 in
> src/vec/vec/impls/seq/seqcusp//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h
> > > [17]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> > > [17]PETSC ERROR: Error in external library!
> > > [17]PETSC ERROR: CUSP error 46!
> > > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > > [17]PETSC ERROR: Petsc Development HG revision:   HG Date:
> > > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
> > > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > > [17]PETSC ERROR: See docs/index.html for manual pages.
> > > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by
> zampini Fri Jan 20 19:01:30 2012
> > > [17]PETSC ERROR: Libraries linked from
> /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
> > > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
> > > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
> --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
> --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
> --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
> --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
> --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
> --with-log=1 --with-info=1
> --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
> --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
> --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
> PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
> PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
> --with-c++-support=1 --with-large-file-io=1
> --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
> --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
> --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
> --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
> --download-metis=1 --download-parmetis=1 --download-chaco=1
> --download-scotch=1 --download-party=1
> --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
> --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
> > > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > > [17]PETSC ERROR: VecCUSPCopyFromGPUSome() line 228 in
> src/vec/vec/impls/seq/seqcusp/veccusp.cu
> > > [17]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> > > [17]PETSC ERROR: Error in external library!
> > > [17]PETSC ERROR: CUSP error 76!
> > > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > > [17]PETSC ERROR: Petsc Development HG revision:   HG Date:
> > > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
> > > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > > [17]PETSC ERROR: See docs/index.html for manual pages.
> > > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by
> zampini Fri Jan 20 19:01:30 2012
> > > [17]PETSC ERROR: Libraries linked from
> /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
> > > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
> > > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
> --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
> --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
> --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
> --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
> --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
> --with-log=1 --with-info=1
> --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
> --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
> --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
> PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
> PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
> --with-c++-support=1 --with-large-file-io=1
> --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
> --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
> --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
> --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
> --download-metis=1 --download-parmetis=1 --download-chaco=1
> --download-scotch=1 --download-party=1
> --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
> --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
> > > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > > [17]PETSC ERROR: VecCUSPCopyFromGPUSome_Public() line 263 in
> src/vec/vec/impls/seq/seqcusp/veccusp.cu
> > > [17]PETSC ERROR: VecScatterBegin_1() line 57 in
> src/vec/vec/utils//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/utils/vpscat.h
> > > [17]PETSC ERROR: VecScatterBegin() line 1574 in
> src/vec/vec/utils/vscat.c
> > > [17]PETSC ERROR: PCISSetUp() line 46 in src/ksp/pc/impls/is/pcis.c
> > > [17]PETSC ERROR: PCSetUp_BDDC() line 230 in
> src/ksp/pc/impls/bddc/bddc.c
> > > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
> > > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
> > > [17]PETSC ERROR: PCBDDCSetupCoarseEnvironment() line 2081 in
> src/ksp/pc/impls/bddc/bddc.c
> > > [17]PETSC ERROR: PCBDDCCoarseSetUp() line 1341 in
> src/ksp/pc/impls/bddc/bddc.c
> > > [17]PETSC ERROR: PCSetUp_BDDC() line 255 in
> src/ksp/pc/impls/bddc/bddc.c
> > > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
> > > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
> > >
> > >
> > > On Jan 20, 2012, at 12:20 PM, Stefano Zampini wrote:
> > >
> > > > Hi recently installed petsc-dev on a GPU cluster. I got an error in
> external library CUSP when calling PCISSetup: more precisely, doing
> VecScatterBegin on SEQ (not SEQCUSP!) vectors (please see the traceback
> attached). I'm developing the BDDC preconditioner code inside PETSc and
> this error occurred when doing multilevel: in such case some procs (like
> proc 17 in the case attached) has local  dimension (relevant to PCIS) equal
> to zero.
> > > >
> > > > Thus, I think the real problem stays on line 41 of
> src/vec/vec/utils/vpscat.h. If you tell me the reason why you used the
> first condition on the if clause I can patch the problem.
> > > >
> > > > Regards,
> > > > --
> > > > Stefano
> > > > <traceback>
> > >
> > >
> >
> >
> >
> >
> > --
> > Stefano
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120120/7a1684d6/attachment.html>


More information about the petsc-dev mailing list