[petsc-dev] VecScatterBegin_1 with zero sized vectors and PETSC_HAVE_CUSP

Stefano Zampini stefano.zampini at gmail.com
Fri Jan 20 16:58:57 CST 2012


Thank you, I'll let you know if it crashes again. Anyways, the problem is
that xin->map->n (vpscat.h actual line 58) is zero for some of my vectors,
and thus it will enter the if block even if I don't need to do anything
with CUSP. Is it really important the first logic of the OR?

Recompile will take a while since my petsc_arch on gpu cluster is not able
to use cmake to build ( I saw missing files in CMakeLists.txt for CUSP and
GPU related stuffs). Is it a known issue? Is there a way to simply
recompile the changed code only?

Stefano


2012/1/20 Barry Smith <bsmith at mcs.anl.gov>

>
> On Jan 20, 2012, at 2:32 PM, Jed Brown wrote:
>
> > On Fri, Jan 20, 2012 at 14:27, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   I do not understand the error traceback. It should NOT look like this.
> Is that really the exact output from a single failed run? There snould not
> be multiple messages of ----Error Message ---- etc. It shoul immediately
> after the first listing of Configure options show the complete stack where
> the problem happened instead it printed an initial error message again and
> then again and then finally a stack. This is not suppose to be possible.
> >
> > That's the kind of thing that happens if the error is raised on
> COMM_SELF.
>
>     ???? I don't think so. Note the entire error set comes from process
> 17, even with COMM_SELF it is not suppose to print the error message stuff
> multiple times on the same MPI node.
>
> > Also, is this really supposed to use CHKERRCUSP()?
>
>    No, that is wrong, I fixed it but then had a nasty merge with Paul's
> updates to PETSc GPU stuff.  I don't think that caused the grief.
>
>   Stefano,
>
>      Anyways since Paul updated all the cusp stuff please hg pull; hg
> update and rebuild the PETSc library then try again if still problems again
> send the entire output on error.
>
>     If similar thing happens I'm tempted to ask you to run node 17 in the
> debugger and see why the error message comes up multiple times.
> -start_in_debugger -debugger_nodes 17
>
>
>    Barry
>
>
> > The function uses normal CHKERRQ() inside.
> >
> > PetscErrorCode VecCUSPCopyFromGPUSome_Public(Vec v, PetscCUSPIndices ci)
> > {
> >   PetscErrorCode ierr;
> >
> >   PetscFunctionBegin;
> >   ierr =
> VecCUSPCopyFromGPUSome(v,&ci->indicesCPU,&ci->indicesGPU);CHKERRCUSP(ierr);
> >   PetscFunctionReturn(0);
> > }
> >
> >
> > Are you running with multiple threads AND gpus? That won't work.
> >
> >   Anyways I cannot find anywhere a list of Cusp error messages that
> include the numbers 46 and 76; why are not the except messages strings ???
> >
> >
> >   Barry
> >
> >
> > [17]PETSC ERROR: VecCUSPAllocateCheck() line 77 in
> src/vec/vec/impls/seq/seqcusp//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h
> > [17]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> > [17]PETSC ERROR: Error in external library!
> > [17]PETSC ERROR: CUSP error 46!
> > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > [17]PETSC ERROR: Petsc Development HG revision:   HG Date:
> > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [17]PETSC ERROR: See docs/index.html for manual pages.
> > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by zampini
> Fri Jan 20 19:01:30 2012
> > [17]PETSC ERROR: Libraries linked from
> /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
> > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
> > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
> --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
> --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
> --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
> --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
> --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
> --with-log=1 --with-info=1
> --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
> --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
> --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
> PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
> PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
> --with-c++-support=1 --with-large-file-io=1
> --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
> --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
> --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
> --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
> --download-metis=1 --download-parmetis=1 --download-chaco=1
> --download-scotch=1 --download-party=1
> --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
> --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
> > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > [17]PETSC ERROR: VecCUSPCopyFromGPUSome() line 228 in
> src/vec/vec/impls/seq/seqcusp/veccusp.cu
> > [17]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> > [17]PETSC ERROR: Error in external library!
> > [17]PETSC ERROR: CUSP error 76!
> > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > [17]PETSC ERROR: Petsc Development HG revision:   HG Date:
> > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [17]PETSC ERROR: See docs/index.html for manual pages.
> > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by zampini
> Fri Jan 20 19:01:30 2012
> > [17]PETSC ERROR: Libraries linked from
> /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
> > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
> > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
> --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
> --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
> --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
> --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
> --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
> --with-log=1 --with-info=1
> --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
> --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
> --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
> PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
> PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
> --with-c++-support=1 --with-large-file-io=1
> --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
> --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
> --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
> --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
> --download-metis=1 --download-parmetis=1 --download-chaco=1
> --download-scotch=1 --download-party=1
> --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
> --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
> > [17]PETSC ERROR:
> ------------------------------------------------------------------------
> > [17]PETSC ERROR: VecCUSPCopyFromGPUSome_Public() line 263 in
> src/vec/vec/impls/seq/seqcusp/veccusp.cu
> > [17]PETSC ERROR: VecScatterBegin_1() line 57 in
> src/vec/vec/utils//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/utils/vpscat.h
> > [17]PETSC ERROR: VecScatterBegin() line 1574 in src/vec/vec/utils/vscat.c
> > [17]PETSC ERROR: PCISSetUp() line 46 in src/ksp/pc/impls/is/pcis.c
> > [17]PETSC ERROR: PCSetUp_BDDC() line 230 in src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
> > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
> > [17]PETSC ERROR: PCBDDCSetupCoarseEnvironment() line 2081 in
> src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCBDDCCoarseSetUp() line 1341 in
> src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCSetUp_BDDC() line 255 in src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
> > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
> >
> >
> > On Jan 20, 2012, at 12:20 PM, Stefano Zampini wrote:
> >
> > > Hi recently installed petsc-dev on a GPU cluster. I got an error in
> external library CUSP when calling PCISSetup: more precisely, doing
> VecScatterBegin on SEQ (not SEQCUSP!) vectors (please see the traceback
> attached). I'm developing the BDDC preconditioner code inside PETSc and
> this error occurred when doing multilevel: in such case some procs (like
> proc 17 in the case attached) has local  dimension (relevant to PCIS) equal
> to zero.
> > >
> > > Thus, I think the real problem stays on line 41 of
> src/vec/vec/utils/vpscat.h. If you tell me the reason why you used the
> first condition on the if clause I can patch the problem.
> > >
> > > Regards,
> > > --
> > > Stefano
> > > <traceback>
> >
> >
>
>


-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120120/6997fae5/attachment.html>


More information about the petsc-dev mailing list