<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    Hello,<br>
    <br>
    Victor thanks. We've got last version and now it doesn't crash.
    However it seems there is still problem.<br>
    <br>
    Let's look at three different runs:<br>
    <br>
    [agraiver@tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535
    -snes_monitor -ksp_monitor<br>
      0 SNES Function norm 3.906279802209e-03<br>
        0 KSP Residual norm 5.994156809227e+00<br>
        1 KSP Residual norm 3.538158441448e-04<br>
        2 KSP Residual norm 3.124431921666e-04<br>
        3 KSP Residual norm 4.109213410989e-06<br>
      1 SNES Function norm 7.201017610490e-04<br>
        0 KSP Residual norm 3.317803708316e-02<br>
        1 KSP Residual norm 2.447380361169e-06<br>
        2 KSP Residual norm 2.164193969957e-06<br>
        3 KSP Residual norm 2.124317398679e-08<br>
      2 SNES Function norm 1.719678934825e-05<br>
        0 KSP Residual norm 1.651586453143e-06<br>
        1 KSP Residual norm 2.037037536868e-08<br>
        2 KSP Residual norm 1.109736798274e-08<br>
        3 KSP Residual norm 1.857218772156e-12<br>
      3 SNES Function norm 1.159391068583e-09<br>
        0 KSP Residual norm 3.116936044619e-11<br>
        1 KSP Residual norm 1.366503312678e-12<br>
        2 KSP Residual norm 6.598477672192e-13<br>
        3 KSP Residual norm 5.306147277879e-17<br>
      4 SNES Function norm 2.202297235559e-10<br>
    [agraiver@tesla-cmc new]$ mpirun -np 1 ./lapexp -da_grid_x 65535
    -da_vec_type cusp -snes_monitor -ksp_monitor<br>
      0 SNES Function norm 3.906279802209e-03<br>
        0 KSP Residual norm 2.600060425819e+01<br>
        1 KSP Residual norm 1.711173401491e-09<br>
      1 SNES Function norm 2.518839283204e-05<br>
        0 KSP Residual norm 1.864270712051e-01<br>
        1 KSP Residual norm 1.123567613474e-11<br>
      2 SNES Function norm 1.475752536169e-09<br>
        0 KSP Residual norm 1.065095925089e-05<br>
        1 KSP Residual norm 8.918344224261e-16<br>
      3 SNES Function norm 2.186342855894e-10<br>
        0 KSP Residual norm 6.313874615230e-11<br>
        1 KSP Residual norm 2.338370003621e-21<br>
    [agraiver@tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535
    -da_vec_type cusp -snes_monitor -ksp_monitor<br>
      0 SNES Function norm 3.906279802209e-03<br>
        0 KSP Residual norm 5.994156809227e+00<br>
        1 KSP Residual norm 5.927247846249e-05<br>
      1 SNES Function norm 3.906225077938e-03<br>
        0 KSP Residual norm 5.993813868985e+00<br>
        1 KSP Residual norm 5.927575078206e-05<br>
    [agraiver@tesla-cmc new]$<br>
    <br>
    lepexp is the default example, just renamed. The first run used 2
    CPUs, the second one used 1 GPU and the third one ran with 2
    processes using 1 GPU.<br>
    First different is that when use cpu the last string in output is
    always:<br>
    4 SNES Function norm 2.202297235559e-10<br>
    whereas for CPU the last string is "N KSP ...something..."<br>
    Then is seems that for 2 processes using 1 GPU example doesn't
    converge, the norm is quite big. The same situation happens when we
    use 2 process and 2 GPUs. Can you explain this?<br>
    BTW, we can even give you access to our server with 6 CPUs and 8
    GPUs within one node. <br>
    <br>
    Regards,<br>
    Alexander<br>
    <br>
    On 11.05.2011 01:07, Victor Minden wrote:
    <blockquote
      cite="mid:BANLkTinQztdFhg3W2=_ZoiGRH69dy=tzBg@mail.gmail.com"
      type="cite">I pushed my change to petsc-dev, so hopefully a new
      pull of the latest mercurial repository should do it, let me know
      if not.<br clear="all">
      ---<br>
      Victor L. Minden<br>
      <br>
      Tufts University<br>
      School of Engineering<br>
      Class of 2012<br>
      <br>
      <br>
      <div class="gmail_quote">On Tue, May 10, 2011 at 6:59 PM,
        Alexander Grayver <span dir="ltr"><<a moz-do-not-send="true"
            href="mailto:agrayver@gfz-potsdam.de">agrayver@gfz-potsdam.de</a>></span>
        wrote:<br>
        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
          0.8ex; border-left: 1px solid rgb(204, 204, 204);
          padding-left: 1ex;">
          <div bgcolor="#ffffff" text="#000000"> Hi Victor,<br>
            <br>
            Thanks a lot!<br>
            What should we do to get new version?<br>
            <br>
            Regards,<br>
            <font color="#888888"> Alexander</font>
            <div>
              <div class="h5"><br>
                <br>
                On 10.05.2011 02:02, Victor Minden wrote:
                <blockquote type="cite">I believe I've resolved this
                  issue.
                  <div><br>
                  </div>
                  <div>Cheers,</div>
                  <div><br>
                  </div>
                  <div>Victor<br clear="all">
                    ---<br>
                    Victor L. Minden<br>
                    <br>
                    Tufts University<br>
                    School of Engineering<br>
                    Class of 2012<br>
                    <br>
                    <br>
                    <div class="gmail_quote">On Sun, May 8, 2011 at 5:26
                      PM, Victor Minden <span dir="ltr"><<a
                          moz-do-not-send="true"
                          href="mailto:victorminden@gmail.com"
                          target="_blank">victorminden@gmail.com</a>></span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin: 0pt
                        0pt 0pt 0.8ex; border-left: 1px solid rgb(204,
                        204, 204); padding-left: 1ex;"> Barry,<br>
                        <br>
                        I can verify this on breadboard now,<br>
                        <br>
                        with two processes, cuda<br>
                        <br>
minden@bb45:~/petsc-dev/src/snes/examples/tutorials$<br>
                        /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra

                        -machinefile<br>
                        /home/balay/machinefile -n 2 ./ex47cu -da_grid_x
                        65535 -log_summary<br>
                        -snes_monitor -ksp_monitor -da_vec_type cusp<br>
                        <div>  0 SNES Function norm 3.906279802209e-03<br>
                             0 KSP Residual norm 5.994156809227e+00<br>
                             1 KSP Residual norm 5.927247846249e-05<br>
                           1 SNES Function norm 3.906225077938e-03<br>
                             0 KSP Residual norm 5.993813868985e+00<br>
                             1 KSP Residual norm 5.927575078206e-05<br>
                        </div>
                        <div>terminate called after throwing an instance
                          of 'thrust::system::system_error'<br>
                           what():  invalid device pointer<br>
                          terminate called after throwing an instance of
                          'thrust::system::system_error'<br>
                           what():  invalid device pointer<br>
                        </div>
                        Aborted (signal 6)<br>
                        <br>
                        <br>
                        <br>
                        Without cuda<br>
                        <br>
minden@bb45:~/petsc-dev/src/snes/examples/tutorials$<br>
                        /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra

                        -machinefile<br>
                        /home/balay/machinefile -n 2 ./ex47cu -da_grid_x
                        65535 -log_summary<br>
                        <div>-snes_monitor -ksp_monitor<br>
                           0 SNES Function norm 3.906279802209e-03<br>
                             0 KSP Residual norm 5.994156809227e+00<br>
                             1 KSP Residual norm 3.538158441448e-04<br>
                             2 KSP Residual norm 3.124431921666e-04<br>
                             3 KSP Residual norm 4.109213410989e-06<br>
                           1 SNES Function norm 7.201017610490e-04<br>
                             0 KSP Residual norm 3.317803708316e-02<br>
                             1 KSP Residual norm 2.447380361169e-06<br>
                             2 KSP Residual norm 2.164193969957e-06<br>
                             3 KSP Residual norm 2.124317398679e-08<br>
                           2 SNES Function norm <a
                            moz-do-not-send="true"
                            href="tel:1.7196789348" value="+17196789348"
                            target="_blank">1.7196789348</a>25e-05<br>
                             0 KSP Residual norm <a
                            moz-do-not-send="true"
                            href="tel:1.6515864531" value="+16515864531"
                            target="_blank">1.6515864531</a>43e-06<br>
                             1 KSP Residual norm 2.037037536868e-08<br>
                             2 KSP Residual norm 1.109736798274e-08<br>
                             3 KSP Residual norm <a
                            moz-do-not-send="true"
                            href="tel:1.8572187721" value="+18572187721"
                            target="_blank">1.8572187721</a>56e-12<br>
                           3 SNES Function norm 1.159391068583e-09<br>
                             0 KSP Residual norm 3.116936044619e-11<br>
                             1 KSP Residual norm 1.366503312678e-12<br>
                             2 KSP Residual norm 6.598477672192e-13<br>
                             3 KSP Residual norm 5.306147277879e-17<br>
                           4 SNES Function norm 2.202297235559e-10<br>
                          <br>
                        </div>
                        Note the repeated norms when using cuda.  Looks
                        like I'll have to take<br>
                        a closer look at this.<br>
                        <br>
                        -Victor<br>
                        <br>
                        ---<br>
                        Victor L. Minden<br>
                        <br>
                        Tufts University<br>
                        School of Engineering<br>
                        Class of 2012<br>
                        <div>
                          <div><br>
                            <br>
                            <br>
                            On Thu, May 5, 2011 at 2:57 PM, Barry Smith
                            <<a moz-do-not-send="true"
                              href="mailto:bsmith@mcs.anl.gov"
                              target="_blank">bsmith@mcs.anl.gov</a>>
                            wrote:<br>
                            ><br>
                            > Alexander<br>
                            ><br>
                            >    Thank you for the sample code; it
                            will be very useful.<br>
                            ><br>
                            >    We have run parallel jobs with CUDA
                            where each node has only a single MPI
                            process and uses a single GPU without the
                            crash that you get below. I cannot explain
                            why it would not work in your situation. Do
                            you have access to two nodes each with a GPU
                            so you could try that?<br>
                            ><br>
                            >   It is crashing in a delete of a<br>
                            ><br>
                            > struct  _p_PetscCUSPIndices {<br>
                            >  CUSPINTARRAYCPU indicesCPU;<br>
                            >  CUSPINTARRAYGPU indicesGPU;<br>
                            > };<br>
                            ><br>
                            > where
                            cusp::array1d<PetscInt,cusp::device_memory><br>
                            ><br>
                            > thus it is crashing after it has
                            completed actually doing the computation. If
                            you run with -snes_monitor -ksp_monitor with
                            and without the -da_vec_type cusp on 2
                            processes what do you get for output in the
                            two cases? I want to see if it is running
                            correctly on two processes?<br>
                            ><br>
                            > Could the crash be due to memory
                            corruption sometime doing the computation?<br>
                            ><br>
                            ><br>
                            >   Barry<br>
                            ><br>
                            ><br>
                            ><br>
                            ><br>
                            ><br>
                            > On May 5, 2011, at 3:38 AM, Alexander
                            Grayver wrote:<br>
                            ><br>
                            >> Hello!<br>
                            >><br>
                            >> We work with petsc-dev branch and <a
                              moz-do-not-send="true"
                              href="http://ex47cu.cu" target="_blank">ex47cu.cu</a>
                            example. Our platform is<br>
                            >> Intel Quad processor and 8
                            identical Tesla GPUs. CUDA 3.2 toolkit is<br>
                            >> installed.<br>
                            >> Ideally we would like to make petsc
                            working in a multi-GPU way within<br>
                            >> just one node so that different
                            GPUs could be attached to different<br>
                            >> processes.<br>
                            >> Since it's not possible within
                            current PETSc implementation we created a<br>
                            >> preload library (see LD_PRELOAD for
                            details) for CUBLAS function<br>
                            >> cublasInit().<br>
                            >> When PETSc calls this function our
                            library gets control and we assign<br>
                            >> GPUs according to rank within MPI
                            communicator, then we call original<br>
                            >> cublasInit().<br>
                            >> This preload library is very
                            simple, see petsc_mgpu.c attached.<br>
                            >> This trick makes each process to
                            have its own context and ideally all<br>
                            >> computations should be distributed
                            over several GPUs.<br>
                            >><br>
                            >> We managed to build petsc and
                            example (see makefile attached) and we<br>
                            >> tested it as follows:<br>
                            >><br>
                            >> [agraiver@tesla-cmc new]$ ./lapexp
                            -da_grid_x 65535 -info > cpu_1process.out<br>
                            >> [agraiver@tesla-cmc new]$ mpirun
                            -np 2 ./lapexp -da_grid_x 65535 -info ><br>
                            >> cpu_2processes.out<br>
                            >> [agraiver@tesla-cmc new]$ ./lapexp
                            -da_grid_x 65535 -da_vec_type cusp<br>
                            >> -info > gpu_1process.out<br>
                            >> [agraiver@tesla-cmc new]$ mpirun
                            -np 2 ./lapexp -da_grid_x 65535<br>
                            >> -da_vec_type cusp -info >
                            gpu_2processes.out<br>
                            >><br>
                            >> Everything except last
                            configuration works well. The last one
                            crashes<br>
                            >> with the following exception and
                            callstack:<br>
                            >> terminate called after throwing an
                            instance of<br>
                            >> 'thrust::system::system_error'<br>
                            >>   what():  invalid device pointer<br>
                            >> [tesla-cmc:15549] *** Process
                            received signal ***<br>
                            >> [tesla-cmc:15549] Signal: Aborted
                            (6)<br>
                            >> [tesla-cmc:15549] Signal code:
                             (-6)<br>
                            >> [tesla-cmc:15549] [ 0]
                            /lib64/libpthread.so.0() [0x3de540eeb0]<br>
                            >> [tesla-cmc:15549] [ 1]
                            /lib64/libc.so.6(gsignal+0x35)
                            [0x3de50330c5]<br>
                            >> [tesla-cmc:15549] [ 2]
                            /lib64/libc.so.6(abort+0x186) [0x3de5034a76]<br>
                            >> [tesla-cmc:15549] [ 3]<br>
                            >>
/opt/llvm/dragonegg/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d)<br>
                            >> [0x7f0d3530b95d]<br>
                            >> [tesla-cmc:15549] [ 4]<br>
                            >>
                            /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7b76)
                            [0x7f0d35309b76]<br>
                            >> [tesla-cmc:15549] [ 5]<br>
                            >>
                            /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7ba3)
                            [0x7f0d35309ba3]<br>
                            >> [tesla-cmc:15549] [ 6]<br>
                            >>
                            /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7cae)
                            [0x7f0d35309cae]<br>
                            >> [tesla-cmc:15549] [ 7]<br>
                            >>
./lapexp(_ZN6thrust6detail6device4cuda4freeILj0EEEvNS_10device_ptrIvEE+0x69)<br>
                            >> [0x426320]<br>
                            >> [tesla-cmc:15549] [ 8]<br>
                            >>
./lapexp(_ZN6thrust6detail6device8dispatch4freeILj0EEEvNS_10device_ptrIvEENS0_21cuda_device_space_tagE+0x2b)<br>
                            >> [0x4258b2]<br>
                            >> [tesla-cmc:15549] [ 9]<br>
                            >>
                            ./lapexp(_ZN6thrust11device_freeENS_10device_ptrIvEE+0x2f)
                            [0x424f78]<br>
                            >> [tesla-cmc:15549] [10]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust23device_malloc_allocatorIiE10deallocateENS_10device_ptrIiEEm+0x33)<br>
                            >> [0x7f0d36aeacff]<br>
                            >> [tesla-cmc:15549] [11]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEE10deallocateEv+0x6e)<br>
                            >> [0x7f0d36ae8e78]<br>
                            >> [tesla-cmc:15549] [12]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEED1Ev+0x19)<br>
                            >> [0x7f0d36ae75f7]<br>
                            >> [tesla-cmc:15549] [13]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail11vector_baseIiNS_23device_malloc_allocatorIiEEED1Ev+0x52)<br>
                            >> [0x7f0d36ae65f4]<br>
                            >> [tesla-cmc:15549] [14]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN4cusp7array1dIiN6thrust6detail21cuda_device_space_tagEED1Ev+0x18)<br>
                            >> [0x7f0d36ae5c2e]<br>
                            >> [tesla-cmc:15549] [15]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN19_p_PetscCUSPIndicesD1Ev+0x1d)

                            [0x7f0d3751e45f]<br>
                            >> [tesla-cmc:15549] [16]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(PetscCUSPIndicesDestroy+0x20f)<br>
                            >> [0x7f0d3750c840]<br>
                            >> [tesla-cmc:15549] [17]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy_PtoP+0x1bc8)<br>
                            >> [0x7f0d375af8af]<br>
                            >> [tesla-cmc:15549] [18]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy+0x586)<br>
                            >> [0x7f0d375e9ddf]<br>
                            >> [tesla-cmc:15549] [19]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy_MPIAIJ+0x49f)<br>
                            >> [0x7f0d37191d24]<br>
                            >> [tesla-cmc:15549] [20]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy+0x546)
                            [0x7f0d370d54fe]<br>
                            >> [tesla-cmc:15549] [21]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESReset+0x5d1)
                            [0x7f0d3746fac3]<br>
                            >> [tesla-cmc:15549] [22]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESDestroy+0x4b8)
                            [0x7f0d37470210]<br>
                            >> [tesla-cmc:15549] [23]
                            ./lapexp(main+0x5ed) [0x420745]<br>
                            >><br>
                            >> I've sent all detailed output files
                            for different execution<br>
                            >> configuration listed above as well
                            as configure.log and make.log to<br>
                            >> <a moz-do-not-send="true"
                              href="mailto:petsc-maint@mcs.anl.gov"
                              target="_blank">petsc-maint@mcs.anl.gov</a>
                            hoping that someone could recognize the
                            problem.<br>
                            >> Now we have one node with
                            multi-GPU, but I'm also wondering if someone<br>
                            >> really tested usage of GPU
                            functionality over several nodes with one
                            GPU<br>
                            >> each?<br>
                            >><br>
                            >> Regards,<br>
                            >> Alexander<br>
                            >><br>
                            >>
                            <petsc_mgpu.c><makefile.txt><configure.log><br>
                            ><br>
                            ><br>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </blockquote>
                <br>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>