[petsc-dev] PETSc and multigpu

Alexander Grayver agrayver at gfz-potsdam.de
Thu May 5 03:34:57 CDT 2011


Hello!

We work with petsc-dev branch and ex47cu.cu example. Our platform is
Intel Quad processor and 8 identical Tesla GPUs. CUDA 3.2 toolkit is
installed.
Ideally we would like to make petsc working in a multi-GPU way within
just one node so that different GPUs could be attached to different
processes.
Since it's not possible within current PETSc implementation we created a
preload library (see LD_PRELOAD for details) for CUBLAS function
cublasInit().
When PETSc calls this function our library gets control and we assign
GPUs according to rank within MPI communicator, then we call original
cublasInit().
This preload library is very simple, see petsc_mgpu.c attached.
This trick makes each process to have its own context and ideally all
computations should be distributed over several GPUs.

We managed to build petsc and example (see makefile attached) and we
tested it as follows:

[agraiver at tesla-cmc new]$ ./lapexp -da_grid_x 65535 -info>  cpu_1process.out
[agraiver at tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535 -info>  cpu_2processes.out
[agraiver at tesla-cmc new]$ ./lapexp -da_grid_x 65535 -da_vec_type cusp -info>  gpu_1process.out
[agraiver at tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535 -da_vec_type cusp -info>  gpu_2processes.out

Everything except last configuration works well. The last one crashes with the following exception and callstack:
terminate called after throwing an instance of
'thrust::system::system_error'
    what():  invalid device pointer
[tesla-cmc:15549] *** Process received signal ***
[tesla-cmc:15549] Signal: Aborted (6)
[tesla-cmc:15549] Signal code:  (-6)
[tesla-cmc:15549] [ 0] /lib64/libpthread.so.0() [0x3de540eeb0]
[tesla-cmc:15549] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3de50330c5]
[tesla-cmc:15549] [ 2] /lib64/libc.so.6(abort+0x186) [0x3de5034a76]
[tesla-cmc:15549] [ 3] /opt/llvm/dragonegg/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) [0x7f0d3530b95d]
[tesla-cmc:15549] [ 4] /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7b76) [0x7f0d35309b76]
[tesla-cmc:15549] [ 5] /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7ba3) [0x7f0d35309ba3]
[tesla-cmc:15549] [ 6] /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7cae) [0x7f0d35309cae]
[tesla-cmc:15549] [ 7] ./lapexp(_ZN6thrust6detail6device4cuda4freeILj0EEEvNS_10device_ptrIvEE+0x69) [0x426320]
[tesla-cmc:15549] [ 8] ./lapexp(_ZN6thrust6detail6device8dispatch4freeILj0EEEvNS_10device_ptrIvEENS0_21cuda_device_space_tagE+0x2b) [0x4258b2]
[tesla-cmc:15549] [ 9] ./lapexp(_ZN6thrust11device_freeENS_10device_ptrIvEE+0x2f) [0x424f78]
[tesla-cmc:15549] [10] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust23device_malloc_allocatorIiE10deallocateENS_10device_ptrIiEEm+0x33) [0x7f0d36aeacff]
[tesla-cmc:15549] [11] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEE10deallocateEv+0x6e) [0x7f0d36ae8e78]
[tesla-cmc:15549] [12] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEED1Ev+0x19) [0x7f0d36ae75f7]
[tesla-cmc:15549] [13] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail11vector_baseIiNS_23device_malloc_allocatorIiEEED1Ev+0x52) [0x7f0d36ae65f4]
[tesla-cmc:15549] [14] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN4cusp7array1dIiN6thrust6detail21cuda_device_space_tagEED1Ev+0x18) [0x7f0d36ae5c2e]
[tesla-cmc:15549] [15] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN19_p_PetscCUSPIndicesD1Ev+0x1d) [0x7f0d3751e45f]
[tesla-cmc:15549] [16] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(PetscCUSPIndicesDestroy+0x20f) [0x7f0d3750c840]
[tesla-cmc:15549] [17] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy_PtoP+0x1bc8) [0x7f0d375af8af]
[tesla-cmc:15549] [18] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy+0x586) [0x7f0d375e9ddf]
[tesla-cmc:15549] [19] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy_MPIAIJ+0x49f) [0x7f0d37191d24]
[tesla-cmc:15549] [20] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy+0x546) [0x7f0d370d54fe]
[tesla-cmc:15549] [21] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESReset+0x5d1) [0x7f0d3746fac3]
[tesla-cmc:15549] [22] /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESDestroy+0x4b8) [0x7f0d37470210]
[tesla-cmc:15549] [23] ./lapexp(main+0x5ed) [0x420745]

I've sent all detailed output files for different execution
configuration listed above as well as configure.log and make.log to
petsc-maint at mcs.anl.gov hoping that someone could recognize the problem.
Now we have one node with multi-GPU, but I'm also wondering if someone
really tested usage of GPU functionality over several nodes with one GPU
each?

Regards,
Alexander


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: petsc_mgpu.c
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110505/0ec321b6/attachment.c>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: makefile
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110505/0ec321b6/attachment.ksh>


More information about the petsc-dev mailing list