[petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit
Zhang, Hong
hongzhang at anl.gov
Fri Feb 7 12:23:39 CST 2020
Hi all,
Previously I have noticed that the first call to a CUDA function such as cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. Then I prepared a simple example as attached to help OCLF reproduce the problem. It turned out that the problem was caused by PETSc. The 7.5-second overhead can be observed only when the PETSc lib is linked. If I do not link PETSc, it runs normally. Does anyone have any idea why this happens and how to fix it?
Hong (Mr.)
bash-4.2$ cat ex_simple.c
#include <time.h>
#include <cuda_runtime.h>
#include <stdio.h>
int main(int argc,char **args)
{
clock_t start,s1,s2,s3;
double cputime;
double *init,tmp[100] = {0};
start = clock();
cudaFree(0);
s1 = clock();
cudaMalloc((void **)&init,100*sizeof(double));
s2 = clock();
cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
s3 = clock();
printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 - s2)) / CLOCKS_PER_SEC);
return 0;
}
More information about the petsc-dev
mailing list