[petsc-dev] VecScatterBegin_1 with zero sized vectors and PETSC_HAVE_CUSP
Stefano Zampini
stefano.zampini at gmail.com
Fri Jan 20 17:50:37 CST 2012
Great! The patch works, and the code doesn't crash anymore.
2012/1/21 Matthew Knepley <knepley at gmail.com>
> On Fri, Jan 20, 2012 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> On Jan 20, 2012, at 4:58 PM, Stefano Zampini wrote:
>>
>> > Thank you, I'll let you know if it crashes again. Anyways, the problem
>> is that xin->map->n (vpscat.h actual line 58) is zero for some of my
>> vectors, and thus it will enter the if block even if I don't need to do
>> anything with CUSP. Is it really important the first logic of the OR?
>>
>> The block is suppose handle the 0 case just fine, if it does not
>> handle the 0 case then that is a bug either in PETSc or CUSP and needs to
>> be fixed. Having 0 handled by the if is crucial to get any kind of
>> performance otherwise it will always copy the entire vector from the GPU to
>> the CPU for absolutely no reason.
>>
>> >
>> > Recompile will take a while since my petsc_arch on gpu cluster is not
>> able to use cmake to build ( I saw missing files in CMakeLists.txt for CUSP
>> and GPU related stuffs). Is it a known issue? Is there a way to simply
>> recompile the changed code only?
>>
>> I think this is because the cmake developers do not yet support the
>> cuda compiler nvcc. Bitch to them. cmake is the way in PETSc to get partial
>> recompiles.
>
>
> Or use the Python make, which handles nvcc, and bitch to me about all its
> other problems.
>
> Matt
>
>
>>
>> Barry
>>
>> >
>> > Stefano
>> >
>> >
>> > 2012/1/20 Barry Smith <bsmith at mcs.anl.gov>
>> >
>> > On Jan 20, 2012, at 2:32 PM, Jed Brown wrote:
>> >
>> > > On Fri, Jan 20, 2012 at 14:27, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> > >
>> > > I do not understand the error traceback. It should NOT look like
>> this. Is that really the exact output from a single failed run? There
>> snould not be multiple messages of ----Error Message ---- etc. It shoul
>> immediately after the first listing of Configure options show the complete
>> stack where the problem happened instead it printed an initial error
>> message again and then again and then finally a stack. This is not suppose
>> to be possible.
>> > >
>> > > That's the kind of thing that happens if the error is raised on
>> COMM_SELF.
>> >
>> > ???? I don't think so. Note the entire error set comes from process
>> 17, even with COMM_SELF it is not suppose to print the error message stuff
>> multiple times on the same MPI node.
>> >
>> > > Also, is this really supposed to use CHKERRCUSP()?
>> >
>> > No, that is wrong, I fixed it but then had a nasty merge with Paul's
>> updates to PETSc GPU stuff. I don't think that caused the grief.
>> >
>> > Stefano,
>> >
>> > Anyways since Paul updated all the cusp stuff please hg pull; hg
>> update and rebuild the PETSc library then try again if still problems again
>> send the entire output on error.
>> >
>> > If similar thing happens I'm tempted to ask you to run node 17 in
>> the debugger and see why the error message comes up multiple times.
>> -start_in_debugger -debugger_nodes 17
>> >
>> >
>> > Barry
>> >
>> >
>> > > The function uses normal CHKERRQ() inside.
>> > >
>> > > PetscErrorCode VecCUSPCopyFromGPUSome_Public(Vec v, PetscCUSPIndices
>> ci)
>> > > {
>> > > PetscErrorCode ierr;
>> > >
>> > > PetscFunctionBegin;
>> > > ierr =
>> VecCUSPCopyFromGPUSome(v,&ci->indicesCPU,&ci->indicesGPU);CHKERRCUSP(ierr);
>> > > PetscFunctionReturn(0);
>> > > }
>> > >
>> > >
>> > > Are you running with multiple threads AND gpus? That won't work.
>> > >
>> > > Anyways I cannot find anywhere a list of Cusp error messages that
>> include the numbers 46 and 76; why are not the except messages strings ???
>> > >
>> > >
>> > > Barry
>> > >
>> > >
>> > > [17]PETSC ERROR: VecCUSPAllocateCheck() line 77 in
>> src/vec/vec/impls/seq/seqcusp//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h
>> > > [17]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> > > [17]PETSC ERROR: Error in external library!
>> > > [17]PETSC ERROR: CUSP error 46!
>> > > [17]PETSC ERROR:
>> ------------------------------------------------------------------------
>> > > [17]PETSC ERROR: Petsc Development HG revision: HG Date:
>> > > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
>> > > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> > > [17]PETSC ERROR: See docs/index.html for manual pages.
>> > > [17]PETSC ERROR:
>> ------------------------------------------------------------------------
>> > > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by
>> zampini Fri Jan 20 19:01:30 2012
>> > > [17]PETSC ERROR: Libraries linked from
>> /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
>> > > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
>> > > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
>> --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
>> --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
>> --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
>> --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
>> --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
>> --with-log=1 --with-info=1
>> --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
>> --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
>> --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
>> PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
>> PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
>> --with-c++-support=1 --with-large-file-io=1
>> --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
>> --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
>> --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
>> --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
>> --download-metis=1 --download-parmetis=1 --download-chaco=1
>> --download-scotch=1 --download-party=1
>> --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
>> --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
>> > > [17]PETSC ERROR:
>> ------------------------------------------------------------------------
>> > > [17]PETSC ERROR: VecCUSPCopyFromGPUSome() line 228 in
>> src/vec/vec/impls/seq/seqcusp/veccusp.cu
>> > > [17]PETSC ERROR: --------------------- Error Message
>> ------------------------------------
>> > > [17]PETSC ERROR: Error in external library!
>> > > [17]PETSC ERROR: CUSP error 76!
>> > > [17]PETSC ERROR:
>> ------------------------------------------------------------------------
>> > > [17]PETSC ERROR: Petsc Development HG revision: HG Date:
>> > > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
>> > > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> > > [17]PETSC ERROR: See docs/index.html for manual pages.
>> > > [17]PETSC ERROR:
>> ------------------------------------------------------------------------
>> > > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by
>> zampini Fri Jan 20 19:01:30 2012
>> > > [17]PETSC ERROR: Libraries linked from
>> /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
>> > > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
>> > > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
>> --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
>> --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
>> --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
>> --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
>> --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
>> --with-log=1 --with-info=1
>> --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
>> --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
>> --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
>> PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
>> PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
>> --with-c++-support=1 --with-large-file-io=1
>> --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
>> --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
>> --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
>> --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
>> --download-metis=1 --download-parmetis=1 --download-chaco=1
>> --download-scotch=1 --download-party=1
>> --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
>> --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
>> > > [17]PETSC ERROR:
>> ------------------------------------------------------------------------
>> > > [17]PETSC ERROR: VecCUSPCopyFromGPUSome_Public() line 263 in
>> src/vec/vec/impls/seq/seqcusp/veccusp.cu
>> > > [17]PETSC ERROR: VecScatterBegin_1() line 57 in
>> src/vec/vec/utils//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/utils/vpscat.h
>> > > [17]PETSC ERROR: VecScatterBegin() line 1574 in
>> src/vec/vec/utils/vscat.c
>> > > [17]PETSC ERROR: PCISSetUp() line 46 in src/ksp/pc/impls/is/pcis.c
>> > > [17]PETSC ERROR: PCSetUp_BDDC() line 230 in
>> src/ksp/pc/impls/bddc/bddc.c
>> > > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
>> > > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
>> > > [17]PETSC ERROR: PCBDDCSetupCoarseEnvironment() line 2081 in
>> src/ksp/pc/impls/bddc/bddc.c
>> > > [17]PETSC ERROR: PCBDDCCoarseSetUp() line 1341 in
>> src/ksp/pc/impls/bddc/bddc.c
>> > > [17]PETSC ERROR: PCSetUp_BDDC() line 255 in
>> src/ksp/pc/impls/bddc/bddc.c
>> > > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
>> > > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
>> > >
>> > >
>> > > On Jan 20, 2012, at 12:20 PM, Stefano Zampini wrote:
>> > >
>> > > > Hi recently installed petsc-dev on a GPU cluster. I got an error in
>> external library CUSP when calling PCISSetup: more precisely, doing
>> VecScatterBegin on SEQ (not SEQCUSP!) vectors (please see the traceback
>> attached). I'm developing the BDDC preconditioner code inside PETSc and
>> this error occurred when doing multilevel: in such case some procs (like
>> proc 17 in the case attached) has local dimension (relevant to PCIS) equal
>> to zero.
>> > > >
>> > > > Thus, I think the real problem stays on line 41 of
>> src/vec/vec/utils/vpscat.h. If you tell me the reason why you used the
>> first condition on the if clause I can patch the problem.
>> > > >
>> > > > Regards,
>> > > > --
>> > > > Stefano
>> > > > <traceback>
>> > >
>> > >
>> >
>> >
>> >
>> >
>> > --
>> > Stefano
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
--
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120121/6a7311e8/attachment.html>
More information about the petsc-dev
mailing list