[petsc-dev] [petsc-maint #72279] PETSc and multigpu

Barry Smith bsmith at mcs.anl.gov
Thu May 5 14:36:53 CDT 2011


  Alexandar,

   Could you try putting a MPI_Barrier() just before the PetscFinalize() in your GPU example and see if it still crashes?

    I'm wondering if the first process to get to cublasShutdown(), which is in PetscFinalize(), is somehow shutting down all the GPUs for the node and so the other processes that have not finished working with their GPUs crash. Just a wild guess but worth checking.

    Thanks

      Barry


On May 5, 2011, at 1:57 PM, Barry Smith wrote:

> 
> Alexander
> 
>    Thank you for the sample code; it will be very useful.
> 
>    We have run parallel jobs with CUDA where each node has only a single MPI process and uses a single GPU without the crash that you get below. I cannot explain why it would not work in your situation. Do you have access to two nodes each with a GPU so you could try that? 
> 
>   It is crashing in a delete of a 
> 
> struct  _p_PetscCUSPIndices {
>  CUSPINTARRAYCPU indicesCPU;
>  CUSPINTARRAYGPU indicesGPU;
> };
> 
> where cusp::array1d<PetscInt,cusp::device_memory>
> 
> thus it is crashing after it has completed actually doing the computation. If you run with -snes_monitor -ksp_monitor with and without the -da_vec_type cusp on 2 processes what do you get for output in the two cases? I want to see if it is running correctly on two processes?
> 
> Could the crash be due to memory corruption sometime doing the computation?
> 
> 
>   Barry
> 
> 
> 
> 
> 
> On May 5, 2011, at 3:38 AM, Alexander Grayver wrote:
> 
>> Hello!
>> 
>> We work with petsc-dev branch and ex47cu.cu example. Our platform is 
>> Intel Quad processor and 8 identical Tesla GPUs. CUDA 3.2 toolkit is 
>> installed.
>> Ideally we would like to make petsc working in a multi-GPU way within 
>> just one node so that different GPUs could be attached to different 
>> processes.
>> Since it's not possible within current PETSc implementation we created a 
>> preload library (see LD_PRELOAD for details) for CUBLAS function 
>> cublasInit().
>> When PETSc calls this function our library gets control and we assign 
>> GPUs according to rank within MPI communicator, then we call original 
>> cublasInit().
>> This preload library is very simple, see petsc_mgpu.c attached.
>> This trick makes each process to have its own context and ideally all 
>> computations should be distributed over several GPUs.
>> 
>> We managed to build petsc and example (see makefile attached) and we 
>> tested it as follows:
>> 
>> [agraiver at tesla-cmc new]$ ./lapexp -da_grid_x 65535 -info > cpu_1process.out
>> [agraiver at tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535 -info > 
>> cpu_2processes.out
>> [agraiver at tesla-cmc new]$ ./lapexp -da_grid_x 65535 -da_vec_type cusp 
>> -info > gpu_1process.out
>> [agraiver at tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535 
>> -da_vec_type cusp -info > gpu_2processes.out
>> 
>> Everything except last configuration works well. The last one crashes 
>> with the following exception and callstack:
>> terminate called after throwing an instance of 
>> 'thrust::system::system_error'
>>  what():  invalid device pointer
>> [tesla-cmc:15549] *** Process received signal ***
>> [tesla-cmc:15549] Signal: Aborted (6)
>> [tesla-cmc:15549] Signal code:  (-6)
>> [tesla-cmc:15549] [ 0] /lib64/libpthread.so.0() [0x3de540eeb0]
>> [tesla-cmc:15549] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3de50330c5]
>> [tesla-cmc:15549] [ 2] /lib64/libc.so.6(abort+0x186) [0x3de5034a76]
>> [tesla-cmc:15549] [ 3] 
>> /opt/llvm/dragonegg/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) 
>> [0x7f0d3530b95d]
>> [tesla-cmc:15549] [ 4] 
>> /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7b76) [0x7f0d35309b76]
>> [tesla-cmc:15549] [ 5] 
>> /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7ba3) [0x7f0d35309ba3]
>> [tesla-cmc:15549] [ 6] 
>> /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7cae) [0x7f0d35309cae]
>> [tesla-cmc:15549] [ 7] 
>> ./lapexp(_ZN6thrust6detail6device4cuda4freeILj0EEEvNS_10device_ptrIvEE+0x69) 
>> [0x426320]
>> [tesla-cmc:15549] [ 8] 
>> ./lapexp(_ZN6thrust6detail6device8dispatch4freeILj0EEEvNS_10device_ptrIvEENS0_21cuda_device_space_tagE+0x2b) 
>> [0x4258b2]
>> [tesla-cmc:15549] [ 9] 
>> ./lapexp(_ZN6thrust11device_freeENS_10device_ptrIvEE+0x2f) [0x424f78]
>> [tesla-cmc:15549] [10] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust23device_malloc_allocatorIiE10deallocateENS_10device_ptrIiEEm+0x33) 
>> [0x7f0d36aeacff]
>> [tesla-cmc:15549] [11] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEE10deallocateEv+0x6e) 
>> [0x7f0d36ae8e78]
>> [tesla-cmc:15549] [12] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEED1Ev+0x19) 
>> [0x7f0d36ae75f7]
>> [tesla-cmc:15549] [13] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail11vector_baseIiNS_23device_malloc_allocatorIiEEED1Ev+0x52) 
>> [0x7f0d36ae65f4]
>> [tesla-cmc:15549] [14] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN4cusp7array1dIiN6thrust6detail21cuda_device_space_tagEED1Ev+0x18) 
>> [0x7f0d36ae5c2e]
>> [tesla-cmc:15549] [15] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN19_p_PetscCUSPIndicesD1Ev+0x1d) [0x7f0d3751e45f]
>> [tesla-cmc:15549] [16] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(PetscCUSPIndicesDestroy+0x20f) 
>> [0x7f0d3750c840]
>> [tesla-cmc:15549] [17] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy_PtoP+0x1bc8) 
>> [0x7f0d375af8af]
>> [tesla-cmc:15549] [18] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy+0x586) 
>> [0x7f0d375e9ddf]
>> [tesla-cmc:15549] [19] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy_MPIAIJ+0x49f) 
>> [0x7f0d37191d24]
>> [tesla-cmc:15549] [20] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy+0x546) [0x7f0d370d54fe]
>> [tesla-cmc:15549] [21] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESReset+0x5d1) [0x7f0d3746fac3]
>> [tesla-cmc:15549] [22] 
>> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESDestroy+0x4b8) [0x7f0d37470210]
>> [tesla-cmc:15549] [23] ./lapexp(main+0x5ed) [0x420745]
>> 
>> I've sent all detailed output files for different execution 
>> configuration listed above as well as configure.log and make.log to 
>> petsc-maint at mcs.anl.gov hoping that someone could recognize the problem.
>> Now we have one node with multi-GPU, but I'm also wondering if someone 
>> really tested usage of GPU functionality over several nodes with one GPU 
>> each?
>> 
>> Regards,
>> Alexander
>> 
>> <petsc_mgpu.c><makefile.txt><configure.log>
> 




More information about the petsc-dev mailing list