[petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

Matthew Knepley knepley at gmail.com
Fri Feb 7 12:53:08 CST 2020


On Fri, Feb 7, 2020 at 1:23 PM Zhang, Hong via petsc-dev <
petsc-dev at mcs.anl.gov> wrote:

> Hi all,
>
> Previously I have noticed that the first call to a CUDA function such as
> cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit.
> Then I prepared a simple example as attached to help OCLF reproduce the
> problem. It turned out that the problem was  caused by PETSc. The
> 7.5-second overhead can be observed only when the PETSc lib is linked. If I
> do not link PETSc, it runs normally. Does anyone have any idea why this
> happens and how to fix it?
>

Hong, this sounds like a screwed up dynamic linker. Can you try this with a
statically linked executable?

  Thanks,

     Matt


> Hong (Mr.)
>
> bash-4.2$ cat ex_simple.c
> #include <time.h>
> #include <cuda_runtime.h>
> #include <stdio.h>
>
> int main(int argc,char **args)
> {
>   clock_t start,s1,s2,s3;
>   double  cputime;
>   double   *init,tmp[100] = {0};
>
>   start = clock();
>   cudaFree(0);
>   s1 = clock();
>   cudaMalloc((void **)&init,100*sizeof(double));
>   s2 = clock();
>   cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
>   s3 = clock();
>   printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1
> - start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double)
> (s3 - s2)) / CLOCKS_PER_SEC);
>
>   return 0;
> }
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200207/b19baac8/attachment.html>


More information about the petsc-dev mailing list