[petsc-dev] Modify 3rd party lib

Xiaoye S. Li xsli at lbl.gov
Mon Apr 20 21:28:17 CDT 2020


Mark,
thanks for debugging this!  Indeed, I confirm -- that particular "free"
should be regular free instead of cudaHostfree(), because that data
structure is not allocated by cudaAllocHost().  I have been running this
cuda code on Summit, somehow the bug didn't show up.

I just updated the master branch with this fix.  Will be absorbed in a
future release.

As for PRNTlevel>=2, perhaps check your cmake build script.  It should be
set to 0 for production build.

Sherry


On Sun, Apr 19, 2020 at 6:32 PM Mark Adams <mfadams at lbl.gov> wrote:

> Also, we have PRNTlevel>=2 in SuperLU_dist. This is causing a lot of
> output. It's not clear where that is set (it's a #define)
>
> On Sun, Apr 19, 2020 at 9:28 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Sherry, I found the problem.
>>
>> I added this print statement to dDestroy_LU
>>
>>     nb = CEILING(nsupers, grid->npcol);
>>     for (i = 0; i < nb; ++i)
>> if ( Llu->Lrowind_bc_ptr[i] ) {
>>
>> *  fprintf(stderr,"dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[%d/%d] = %p,
>> CPU free Llu->Lrowind_bc_ptr =
>> %p\n",i,nb,Llu->Lnzval_bc_ptr[i],Llu->Lrowind_bc_ptr[i]);*
>>  SUPERLU_FREE (Llu->Lrowind_bc_ptr[i]);
>> #ifdef GPU_ACC
>>    checkCuda(cudaFreeHost(Llu->Lnzval_bc_ptr[i]));
>> #else
>>    SUPERLU_FREE (Llu->Lnzval_bc_ptr[i]);
>> #endif
>> }
>>
>> And I see:
>>
>>    1 SNES Function norm 1.245977692562e-04
>>
>> *dDestroy_LU: GPU free Llu->Lnzval_bc_ptr[0/134] = 0x4ff9b000, CPU free
>> Llu->Lrowind_bc_ptr = 0x4ff9a000*ex112d: cudahook.cc:762: CUresult
>> host_free_callback(void*): Assertion `cacheNode != __null' failed.
>>
>> THis looks like Lnzval_bc_ptr is on the CPU so I removed the GPU_ACC
>> stuff and it works now.
>>
>> I see this in distribution. Perhaps this a serial run bug?
>>
>> On Sun, Apr 19, 2020 at 5:58 PM Xiaoye S. Li <xsli at lbl.gov> wrote:
>>
>>> Mark,
>>> you should fork a branch of your own to do this.
>>>
>>> Sherry
>>>
>>> On Sun, Apr 19, 2020 at 2:54 PM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> First, commit your changes to the superlu_dist branch, then rerun
>>>> configure with
>>>>
>>>> —download-superlu_dist-commit=HEAD
>>>>
>>>>
>>>> > On Apr 20, 2020, at 12:50 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>> >
>>>> > I would like to modify SuperLU_dist but if I change the source and
>>>> configure it says no need to reconfigure, use --force. I use --force and it
>>>> seems to clobber my changes. Can I tell configure to use build but not
>>>> download SuperLU?
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200420/c5107e8b/attachment.html>


More information about the petsc-dev mailing list