I pushed my change to petsc-dev, so hopefully a new pull of the latest mercurial repository should do it, let me know if not.<br clear="all">---<br>Victor L. Minden<br><br>Tufts University<br>School of Engineering<br>Class of 2012<br>

<br><br><div class="gmail_quote">On Tue, May 10, 2011 at 6:59 PM, Alexander Grayver <span dir="ltr"><<a href="mailto:agrayver@gfz-potsdam.de">agrayver@gfz-potsdam.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

  <div bgcolor="#ffffff" text="#000000">

    Hi Victor,<br>

    <br>

    Thanks a lot!<br>

    What should we do to get new version?<br>

    <br>

    Regards,<br><font color="#888888">

    Alexander</font><div><div></div><div class="h5"><br>

    <br>

    On 10.05.2011 02:02, Victor Minden wrote:

    <blockquote type="cite">I believe I've resolved this issue.

      <div><br>

      </div>

      <div>Cheers,</div>

      <div><br>

      </div>

      <div>Victor<br clear="all">

        ---<br>

        Victor L. Minden<br>

        <br>

        Tufts University<br>

        School of Engineering<br>

        Class of 2012<br>

        <br>

        <br>

        <div class="gmail_quote">On Sun, May 8, 2011 at 5:26 PM, Victor

          Minden <span dir="ltr"><<a href="mailto:victorminden@gmail.com" target="_blank">victorminden@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">

            Barry,<br>

            <br>

            I can verify this on breadboard now,<br>

            <br>

            with two processes, cuda<br>

            <br>

            minden@bb45:~/petsc-dev/src/snes/examples/tutorials$<br>

            /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra

            -machinefile<br>

            /home/balay/machinefile -n 2 ./ex47cu -da_grid_x 65535

            -log_summary<br>

            -snes_monitor -ksp_monitor -da_vec_type cusp<br>

            <div>  0 SNES Function norm 3.906279802209e-03<br>

                 0 KSP Residual norm 5.994156809227e+00<br>

                 1 KSP Residual norm 5.927247846249e-05<br>

               1 SNES Function norm 3.906225077938e-03<br>

                 0 KSP Residual norm 5.993813868985e+00<br>

                 1 KSP Residual norm 5.927575078206e-05<br>

            </div>

            <div>terminate called after throwing an instance

              of 'thrust::system::system_error'<br>

               what():  invalid device pointer<br>

              terminate called after throwing an instance of

              'thrust::system::system_error'<br>

               what():  invalid device pointer<br>

            </div>

            Aborted (signal 6)<br>

            <br>

            <br>

            <br>

            Without cuda<br>

            <br>

            minden@bb45:~/petsc-dev/src/snes/examples/tutorials$<br>

            /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra

            -machinefile<br>

            /home/balay/machinefile -n 2 ./ex47cu -da_grid_x 65535

            -log_summary<br>

            <div>-snes_monitor -ksp_monitor<br>

               0 SNES Function norm 3.906279802209e-03<br>

                 0 KSP Residual norm 5.994156809227e+00<br>

                 1 KSP Residual norm 3.538158441448e-04<br>

                 2 KSP Residual norm 3.124431921666e-04<br>

                 3 KSP Residual norm 4.109213410989e-06<br>

               1 SNES Function norm 7.201017610490e-04<br>

                 0 KSP Residual norm 3.317803708316e-02<br>

                 1 KSP Residual norm 2.447380361169e-06<br>

                 2 KSP Residual norm 2.164193969957e-06<br>

                 3 KSP Residual norm 2.124317398679e-08<br>

               2 SNES Function norm <a href="tel:1.7196789348" value="+17196789348" target="_blank">1.7196789348</a>25e-05<br>

                 0 KSP Residual norm <a href="tel:1.6515864531" value="+16515864531" target="_blank">1.6515864531</a>43e-06<br>

                 1 KSP Residual norm 2.037037536868e-08<br>

                 2 KSP Residual norm 1.109736798274e-08<br>

                 3 KSP Residual norm <a href="tel:1.8572187721" value="+18572187721" target="_blank">1.8572187721</a>56e-12<br>

               3 SNES Function norm 1.159391068583e-09<br>

                 0 KSP Residual norm 3.116936044619e-11<br>

                 1 KSP Residual norm 1.366503312678e-12<br>

                 2 KSP Residual norm 6.598477672192e-13<br>

                 3 KSP Residual norm 5.306147277879e-17<br>

               4 SNES Function norm 2.202297235559e-10<br>

              <br>

            </div>

            Note the repeated norms when using cuda.  Looks like I'll

            have to take<br>

            a closer look at this.<br>

            <br>

            -Victor<br>

            <br>

            ---<br>

            Victor L. Minden<br>

            <br>

            Tufts University<br>

            School of Engineering<br>

            Class of 2012<br>

            <div>

              <div><br>

                <br>

                <br>

                On Thu, May 5, 2011 at 2:57 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>>

                wrote:<br>

                ><br>

                > Alexander<br>

                ><br>

                >    Thank you for the sample code; it will be very

                useful.<br>

                ><br>

                >    We have run parallel jobs with CUDA where each

                node has only a single MPI process and uses a single GPU

                without the crash that you get below. I cannot explain

                why it would not work in your situation. Do you have

                access to two nodes each with a GPU so you could try

                that?<br>

                ><br>

                >   It is crashing in a delete of a<br>

                ><br>

                > struct  _p_PetscCUSPIndices {<br>

                >  CUSPINTARRAYCPU indicesCPU;<br>

                >  CUSPINTARRAYGPU indicesGPU;<br>

                > };<br>

                ><br>

                > where

                cusp::array1d<PetscInt,cusp::device_memory><br>

                ><br>

                > thus it is crashing after it has completed actually

                doing the computation. If you run with -snes_monitor

                -ksp_monitor with and without the -da_vec_type cusp on 2

                processes what do you get for output in the two cases? I

                want to see if it is running correctly on two processes?<br>

                ><br>

                > Could the crash be due to memory corruption

                sometime doing the computation?<br>

                ><br>

                ><br>

                >   Barry<br>

                ><br>

                ><br>

                ><br>

                ><br>

                ><br>

                > On May 5, 2011, at 3:38 AM, Alexander Grayver

                wrote:<br>

                ><br>

                >> Hello!<br>

                >><br>

                >> We work with petsc-dev branch and <a href="http://ex47cu.cu" target="_blank">ex47cu.cu</a> example. Our platform is<br>

                >> Intel Quad processor and 8 identical Tesla

                GPUs. CUDA 3.2 toolkit is<br>

                >> installed.<br>

                >> Ideally we would like to make petsc working in

                a multi-GPU way within<br>

                >> just one node so that different GPUs could be

                attached to different<br>

                >> processes.<br>

                >> Since it's not possible within current PETSc

                implementation we created a<br>

                >> preload library (see LD_PRELOAD for details)

                for CUBLAS function<br>

                >> cublasInit().<br>

                >> When PETSc calls this function our library gets

                control and we assign<br>

                >> GPUs according to rank within MPI communicator,

                then we call original<br>

                >> cublasInit().<br>

                >> This preload library is very simple, see

                petsc_mgpu.c attached.<br>

                >> This trick makes each process to have its own

                context and ideally all<br>

                >> computations should be distributed over several

                GPUs.<br>

                >><br>

                >> We managed to build petsc and example (see

                makefile attached) and we<br>

                >> tested it as follows:<br>

                >><br>

                >> [agraiver@tesla-cmc new]$ ./lapexp -da_grid_x

                65535 -info > cpu_1process.out<br>

                >> [agraiver@tesla-cmc new]$ mpirun -np 2 ./lapexp

                -da_grid_x 65535 -info ><br>

                >> cpu_2processes.out<br>

                >> [agraiver@tesla-cmc new]$ ./lapexp -da_grid_x

                65535 -da_vec_type cusp<br>

                >> -info > gpu_1process.out<br>

                >> [agraiver@tesla-cmc new]$ mpirun -np 2 ./lapexp

                -da_grid_x 65535<br>

                >> -da_vec_type cusp -info > gpu_2processes.out<br>

                >><br>

                >> Everything except last configuration works

                well. The last one crashes<br>

                >> with the following exception and callstack:<br>

                >> terminate called after throwing an instance of<br>

                >> 'thrust::system::system_error'<br>

                >>   what():  invalid device pointer<br>

                >> [tesla-cmc:15549] *** Process received signal

                ***<br>

                >> [tesla-cmc:15549] Signal: Aborted (6)<br>

                >> [tesla-cmc:15549] Signal code:  (-6)<br>

                >> [tesla-cmc:15549] [ 0] /lib64/libpthread.so.0()

                [0x3de540eeb0]<br>

                >> [tesla-cmc:15549] [ 1]

                /lib64/libc.so.6(gsignal+0x35) [0x3de50330c5]<br>

                >> [tesla-cmc:15549] [ 2]

                /lib64/libc.so.6(abort+0x186) [0x3de5034a76]<br>

                >> [tesla-cmc:15549] [ 3]<br>

                >>

/opt/llvm/dragonegg/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d)<br>

                >> [0x7f0d3530b95d]<br>

                >> [tesla-cmc:15549] [ 4]<br>

                >>

                /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7b76)

                [0x7f0d35309b76]<br>

                >> [tesla-cmc:15549] [ 5]<br>

                >>

                /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7ba3)

                [0x7f0d35309ba3]<br>

                >> [tesla-cmc:15549] [ 6]<br>

                >>

                /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7cae)

                [0x7f0d35309cae]<br>

                >> [tesla-cmc:15549] [ 7]<br>

                >>

./lapexp(_ZN6thrust6detail6device4cuda4freeILj0EEEvNS_10device_ptrIvEE+0x69)<br>

                >> [0x426320]<br>

                >> [tesla-cmc:15549] [ 8]<br>

                >>

./lapexp(_ZN6thrust6detail6device8dispatch4freeILj0EEEvNS_10device_ptrIvEENS0_21cuda_device_space_tagE+0x2b)<br>

                >> [0x4258b2]<br>

                >> [tesla-cmc:15549] [ 9]<br>

                >>

                ./lapexp(_ZN6thrust11device_freeENS_10device_ptrIvEE+0x2f)

                [0x424f78]<br>

                >> [tesla-cmc:15549] [10]<br>

                >>

/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust23device_malloc_allocatorIiE10deallocateENS_10device_ptrIiEEm+0x33)<br>

                >> [0x7f0d36aeacff]<br>

                >> [tesla-cmc:15549] [11]<br>

                >>

/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEE10deallocateEv+0x6e)<br>

                >> [0x7f0d36ae8e78]<br>

                >> [tesla-cmc:15549] [12]<br>

                >>

/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEED1Ev+0x19)<br>

                >> [0x7f0d36ae75f7]<br>

                >> [tesla-cmc:15549] [13]<br>

                >>

/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail11vector_baseIiNS_23device_malloc_allocatorIiEEED1Ev+0x52)<br>

                >> [0x7f0d36ae65f4]<br>

                >> [tesla-cmc:15549] [14]<br>

                >>

/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN4cusp7array1dIiN6thrust6detail21cuda_device_space_tagEED1Ev+0x18)<br>

                >> [0x7f0d36ae5c2e]<br>

                >> [tesla-cmc:15549] [15]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN19_p_PetscCUSPIndicesD1Ev+0x1d)

                [0x7f0d3751e45f]<br>

                >> [tesla-cmc:15549] [16]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(PetscCUSPIndicesDestroy+0x20f)<br>

                >> [0x7f0d3750c840]<br>

                >> [tesla-cmc:15549] [17]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy_PtoP+0x1bc8)<br>

                >> [0x7f0d375af8af]<br>

                >> [tesla-cmc:15549] [18]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy+0x586)<br>

                >> [0x7f0d375e9ddf]<br>

                >> [tesla-cmc:15549] [19]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy_MPIAIJ+0x49f)<br>

                >> [0x7f0d37191d24]<br>

                >> [tesla-cmc:15549] [20]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy+0x546)

                [0x7f0d370d54fe]<br>

                >> [tesla-cmc:15549] [21]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESReset+0x5d1)

                [0x7f0d3746fac3]<br>

                >> [tesla-cmc:15549] [22]<br>

                >>

                /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESDestroy+0x4b8)

                [0x7f0d37470210]<br>

                >> [tesla-cmc:15549] [23] ./lapexp(main+0x5ed)

                [0x420745]<br>

                >><br>

                >> I've sent all detailed output files for

                different execution<br>

                >> configuration listed above as well as

                configure.log and make.log to<br>

                >> <a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>

                hoping that someone could recognize the problem.<br>

                >> Now we have one node with multi-GPU, but I'm

                also wondering if someone<br>

                >> really tested usage of GPU functionality over

                several nodes with one GPU<br>

                >> each?<br>

                >><br>

                >> Regards,<br>

                >> Alexander<br>

                >><br>

                >>

                <petsc_mgpu.c><makefile.txt><configure.log><br>

                ><br>

                ><br>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </div></div></div>

</blockquote></div><br>