On Fri, Jan 20, 2012 at 5:04 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
On Jan 20, 2012, at 4:58 PM, Stefano Zampini wrote:<br>
<br>
> Thank you, I'll let you know if it crashes again. Anyways, the problem is that xin->map->n (vpscat.h actual line 58) is zero for some of my vectors, and thus it will enter the if block even if I don't need to do anything with CUSP. Is it really important the first logic of the OR?<br>
<br>
</div> The block is suppose handle the 0 case just fine, if it does not handle the 0 case then that is a bug either in PETSc or CUSP and needs to be fixed. Having 0 handled by the if is crucial to get any kind of performance otherwise it will always copy the entire vector from the GPU to the CPU for absolutely no reason.<br>
<div class="im"><br>
><br>
> Recompile will take a while since my petsc_arch on gpu cluster is not able to use cmake to build ( I saw missing files in CMakeLists.txt for CUSP and GPU related stuffs). Is it a known issue? Is there a way to simply recompile the changed code only?<br>
<br>
</div> I think this is because the cmake developers do not yet support the cuda compiler nvcc. Bitch to them. cmake is the way in PETSc to get partial recompiles.</blockquote><div><br></div><div>Or use the Python make, which handles nvcc, and bitch to me about all its other problems.</div>
<div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888"><br>
Barry<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
><br>
> Stefano<br>
><br>
><br>
> 2012/1/20 Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>><br>
><br>
> On Jan 20, 2012, at 2:32 PM, Jed Brown wrote:<br>
><br>
> > On Fri, Jan 20, 2012 at 14:27, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
> ><br>
> > I do not understand the error traceback. It should NOT look like this. Is that really the exact output from a single failed run? There snould not be multiple messages of ----Error Message ---- etc. It shoul immediately after the first listing of Configure options show the complete stack where the problem happened instead it printed an initial error message again and then again and then finally a stack. This is not suppose to be possible.<br>
> ><br>
> > That's the kind of thing that happens if the error is raised on COMM_SELF.<br>
><br>
> ???? I don't think so. Note the entire error set comes from process 17, even with COMM_SELF it is not suppose to print the error message stuff multiple times on the same MPI node.<br>
><br>
> > Also, is this really supposed to use CHKERRCUSP()?<br>
><br>
> No, that is wrong, I fixed it but then had a nasty merge with Paul's updates to PETSc GPU stuff. I don't think that caused the grief.<br>
><br>
> Stefano,<br>
><br>
> Anyways since Paul updated all the cusp stuff please hg pull; hg update and rebuild the PETSc library then try again if still problems again send the entire output on error.<br>
><br>
> If similar thing happens I'm tempted to ask you to run node 17 in the debugger and see why the error message comes up multiple times. -start_in_debugger -debugger_nodes 17<br>
><br>
><br>
> Barry<br>
><br>
><br>
> > The function uses normal CHKERRQ() inside.<br>
> ><br>
> > PetscErrorCode VecCUSPCopyFromGPUSome_Public(Vec v, PetscCUSPIndices ci)<br>
> > {<br>
> > PetscErrorCode ierr;<br>
> ><br>
> > PetscFunctionBegin;<br>
> > ierr = VecCUSPCopyFromGPUSome(v,&ci->indicesCPU,&ci->indicesGPU);CHKERRCUSP(ierr);<br>
> > PetscFunctionReturn(0);<br>
> > }<br>
> ><br>
> ><br>
> > Are you running with multiple threads AND gpus? That won't work.<br>
> ><br>
> > Anyways I cannot find anywhere a list of Cusp error messages that include the numbers 46 and 76; why are not the except messages strings ???<br>
> ><br>
> ><br>
> > Barry<br>
> ><br>
> ><br>
> > [17]PETSC ERROR: VecCUSPAllocateCheck() line 77 in src/vec/vec/impls/seq/seqcusp//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h<br>
> > [17]PETSC ERROR: --------------------- Error Message ------------------------------------<br>
> > [17]PETSC ERROR: Error in external library!<br>
> > [17]PETSC ERROR: CUSP error 46!<br>
> > [17]PETSC ERROR: ------------------------------------------------------------------------<br>
> > [17]PETSC ERROR: Petsc Development HG revision: HG Date:<br>
> > [17]PETSC ERROR: See docs/changes/index.html for recent updates.<br>
> > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>
> > [17]PETSC ERROR: See docs/index.html for manual pages.<br>
> > [17]PETSC ERROR: ------------------------------------------------------------------------<br>
> > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by zampini Fri Jan 20 19:01:30 2012<br>
> > [17]PETSC ERROR: Libraries linked from /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib<br>
> > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012<br>
> > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64 --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20 --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/.. --with-thrust-dir=/caspur/local/apps/cuda/4.0/include --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064 --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1 --with-log=1 --with-info=1 --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1 --with-pthread=1 --with-pthreadclasses=1 --with-precision=double --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3 PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1 --with-c++-support=1 --with-large-file-io=1 --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz --download-metis=1 --download-parmetis=1 --download-chaco=1 --download-scotch=1 --download-party=1 --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a<br>
> > [17]PETSC ERROR: ------------------------------------------------------------------------<br>
> > [17]PETSC ERROR: VecCUSPCopyFromGPUSome() line 228 in src/vec/vec/impls/seq/seqcusp/<a href="http://veccusp.cu" target="_blank">veccusp.cu</a><br>
> > [17]PETSC ERROR: --------------------- Error Message ------------------------------------<br>
> > [17]PETSC ERROR: Error in external library!<br>
> > [17]PETSC ERROR: CUSP error 76!<br>
> > [17]PETSC ERROR: ------------------------------------------------------------------------<br>
> > [17]PETSC ERROR: Petsc Development HG revision: HG Date:<br>
> > [17]PETSC ERROR: See docs/changes/index.html for recent updates.<br>
> > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>
> > [17]PETSC ERROR: See docs/index.html for manual pages.<br>
> > [17]PETSC ERROR: ------------------------------------------------------------------------<br>
> > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by zampini Fri Jan 20 19:01:30 2012<br>
> > [17]PETSC ERROR: Libraries linked from /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib<br>
> > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012<br>
> > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64 --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20 --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/.. --with-thrust-dir=/caspur/local/apps/cuda/4.0/include --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064 --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1 --with-log=1 --with-info=1 --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1 --with-pthread=1 --with-pthreadclasses=1 --with-precision=double --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3 PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1 --with-c++-support=1 --with-large-file-io=1 --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz --download-metis=1 --download-parmetis=1 --download-chaco=1 --download-scotch=1 --download-party=1 --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a<br>
> > [17]PETSC ERROR: ------------------------------------------------------------------------<br>
> > [17]PETSC ERROR: VecCUSPCopyFromGPUSome_Public() line 263 in src/vec/vec/impls/seq/seqcusp/<a href="http://veccusp.cu" target="_blank">veccusp.cu</a><br>
> > [17]PETSC ERROR: VecScatterBegin_1() line 57 in src/vec/vec/utils//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/utils/vpscat.h<br>
> > [17]PETSC ERROR: VecScatterBegin() line 1574 in src/vec/vec/utils/vscat.c<br>
> > [17]PETSC ERROR: PCISSetUp() line 46 in src/ksp/pc/impls/is/pcis.c<br>
> > [17]PETSC ERROR: PCSetUp_BDDC() line 230 in src/ksp/pc/impls/bddc/bddc.c<br>
> > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c<br>
> > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c<br>
> > [17]PETSC ERROR: PCBDDCSetupCoarseEnvironment() line 2081 in src/ksp/pc/impls/bddc/bddc.c<br>
> > [17]PETSC ERROR: PCBDDCCoarseSetUp() line 1341 in src/ksp/pc/impls/bddc/bddc.c<br>
> > [17]PETSC ERROR: PCSetUp_BDDC() line 255 in src/ksp/pc/impls/bddc/bddc.c<br>
> > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c<br>
> > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c<br>
> ><br>
> ><br>
> > On Jan 20, 2012, at 12:20 PM, Stefano Zampini wrote:<br>
> ><br>
> > > Hi recently installed petsc-dev on a GPU cluster. I got an error in external library CUSP when calling PCISSetup: more precisely, doing VecScatterBegin on SEQ (not SEQCUSP!) vectors (please see the traceback attached). I'm developing the BDDC preconditioner code inside PETSc and this error occurred when doing multilevel: in such case some procs (like proc 17 in the case attached) has local dimension (relevant to PCIS) equal to zero.<br>
> > ><br>
> > > Thus, I think the real problem stays on line 41 of src/vec/vec/utils/vpscat.h. If you tell me the reason why you used the first condition on the if clause I can patch the problem.<br>
> > ><br>
> > > Regards,<br>
> > > --<br>
> > > Stefano<br>
> > > <traceback><br>
> ><br>
> ><br>
><br>
><br>
><br>
><br>
> --<br>
> Stefano<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>