[petsc-dev] Modify 3rd party lib

Stefano Zampini stefano.zampini at gmail.com
Tue Apr 21 07:32:46 CDT 2020



> On Apr 21, 2020, at 2:12 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> On Mon, Apr 20, 2020 at 10:28 PM Xiaoye S. Li <xsli at lbl.gov <mailto:xsli at lbl.gov>> wrote:
> Mark,
> thanks for debugging this!  Indeed, I confirm -- that particular "free" should be regular free instead of cudaHostfree(), because that data structure is not allocated by cudaAllocHost().  I have been running this cuda code on Summit, somehow the bug didn't show up.
> 
> Odd, but it seems to work fine for me now. eg, I get a speedup of 6x on a ~50K equation 3D systems (Q3 elements with 2 dof per vertex).
>  

Mark, is it such speedup wrt the CPU version of SUPERLU_DIST? Or just the PETSc factorizations? 

> 
> I just updated the master branch with this fix.  Will be absorbed in a future release.
> 
> As for PRNTlevel>=2, perhaps check your cmake build script.  It should be set to 0 for production build.
> 
> 
> I don't see where that gets set. PRNTlevel does not seem to be in our repo. I see it in 'MAKE_INC/make.cuda_gpu:         -DDEBUGlevel=0 -DPRNTlevel=1 -DPROFlevel=0', but I think it is set at >= 2. I have manually disabled the print statements (~ 5 places).
> 
> Thanks,
> Mark
>   
> Sherry
> 
> 
> On Sun, Apr 19, 2020 at 6:32 PM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> Also, we have PRNTlevel>=2 in SuperLU_dist. This is causing a lot of output. It's not clear where that is set (it's a #define)
> 
> On Sun, Apr 19, 2020 at 9:28 PM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> Sherry, I found the problem.
> 
> I added this print statement to dDestroy_LU
> 
>     nb = CEILING(nsupers, grid->npcol);
>     for (i = 0; i < nb; ++i) 
> if ( Llu->Lrowind_bc_ptr[i] ) {
>  fprintf(stderr,"dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[%d/%d] = %p, CPU free Llu->Lrowind_bc_ptr = %p\n",i,nb,Llu->Lnzval_bc_ptr[i],Llu->Lrowind_bc_ptr[i]);
>    SUPERLU_FREE (Llu->Lrowind_bc_ptr[i]);
> #ifdef GPU_ACC
>    checkCuda(cudaFreeHost(Llu->Lnzval_bc_ptr[i]));
> #else
>    SUPERLU_FREE (Llu->Lnzval_bc_ptr[i]);
> #endif
> }
> 
> And I see:
> 
>    1 SNES Function norm 1.245977692562e-04
> dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[0/134] = 0x4ff9b000, CPU free Llu->Lrowind_bc_ptr = 0x4ff9a000
> ex112d: cudahook.cc:762: CUresult host_free_callback(void*): Assertion `cacheNode != __null' failed.
> 
> THis looks like Lnzval_bc_ptr is on the CPU so I removed the GPU_ACC stuff and it works now.
> 
> I see this in distribution. Perhaps this a serial run bug?
> 
> On Sun, Apr 19, 2020 at 5:58 PM Xiaoye S. Li <xsli at lbl.gov <mailto:xsli at lbl.gov>> wrote:
> Mark,
> you should fork a branch of your own to do this.
> 
> Sherry
> 
> On Sun, Apr 19, 2020 at 2:54 PM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
> First, commit your changes to the superlu_dist branch, then rerun configure with
> 
> —download-superlu_dist-commit=HEAD
> 
> 
> > On Apr 20, 2020, at 12:50 AM, Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> > 
> > I would like to modify SuperLU_dist but if I change the source and configure it says no need to reconfigure, use --force. I use --force and it seems to clobber my changes. Can I tell configure to use build but not download SuperLU?
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200421/6d82d4d3/attachment-0001.html>


More information about the petsc-dev mailing list