Hi Alexander,<div><br></div><div>Looking through the runs for CPU and GPU with only 1 process, I'm seeing the following oddity which you pointed out:</div><div><br></div><div>CPU 1 process</div><div><div>minden@bb45:~/petsc-dev/src/snes/examples/tutorials$ /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra -machinefile /home/balay/machinefile -n 1 ./ex47cu -da_grid_x 65535 -snes_monitor -ksp_monitor</div>
<div>  0 SNES Function norm 3.906279802209e-03 </div><div>    0 KSP Residual norm 2.600060425819e+01 </div><div>    1 KSP Residual norm 1.727316216725e-09 </div><div>  1 SNES Function norm 2.518839280713e-05 </div><div>    0 KSP Residual norm 1.864270710157e-01 </div>
<div>    1 KSP Residual norm 1.518456989028e-11 </div><div>  2 SNES Function norm 1.475794371713e-09 </div><div>    0 KSP Residual norm 1.065102315659e-05 </div><div>    1 KSP Residual norm 1.258453455440e-15 </div><div>  3 SNES Function norm 2.207728411745e-10 </div>
<div>    0 KSP Residual norm 6.963755704792e-12 </div><div>    1 KSP Residual norm 1.188067869190e-21 </div><div>  4 SNES Function norm 2.199244040060e-10 </div><div><br></div><div>GPU 1 process</div><div><div>minden@bb45:~/petsc-dev/src/snes/examples/tutorials$ /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra -machinefile /home/balay/machinefile -n 1 ./ex47cu -da_grid_x 65535 -snes_monitor -ksp_monitor -da_vec_type cusp</div>
<div>  0 SNES Function norm 3.906279802209e-03 </div><div>    0 KSP Residual norm 2.600060425819e+01 </div><div>    1 KSP Residual norm 1.711173401491e-09 </div><div>  1 SNES Function norm 2.518839283204e-05 </div><div>    0 KSP Residual norm 1.864270712051e-01 </div>
<div>    1 KSP Residual norm 1.123567613474e-11 </div><div>  2 SNES Function norm 1.475752536169e-09 </div><div>    0 KSP Residual norm 1.065095925089e-05 </div><div>    1 KSP Residual norm 8.918344224261e-16 </div><div>  3 SNES Function norm 2.186342855894e-10 </div>
<div>    0 KSP Residual norm 6.313874615230e-11 </div><div>    1 KSP Residual norm 2.338370003621e-21 </div></div><div><br></div><div>As you noted, the CPU version terminates on SNES function norm whereas the GPU version stops on a KSP residual norm.  Looking through the exact numbers, I found that the small differences in values between the GPU and CPU versions cause the convergence criterion for the SNES to be off by about 2e-23, so it goes for another round of line search before concluding it has found a local minimum and terminating.  By using GPU matrix as well,</div>
<div><br></div><div>GPU 1 process with cusp matrix</div><div><div>minden@bb45:~/petsc-dev/src/snes/examples/tutorials$ /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra -machinefile /home/balay/machinefile -n 1 ./ex47cu -da_grid_x 65535 -snes_monitor -ksp_monitor -da_vec_type cusp -da_mat_type aijcusp</div>
<div>  0 SNES Function norm 3.906279802209e-03 </div><div>    0 KSP Residual norm 2.600060425819e+01 </div><div>    1 KSP Residual norm 8.745056654228e-10 </div><div>  1 SNES Function norm 2.518839297589e-05 </div><div>    0 KSP Residual norm 1.864270723743e-01 </div>
<div>    1 KSP Residual norm 1.265482694189e-11 </div><div>  2 SNES Function norm 1.475659976840e-09 </div><div>    0 KSP Residual norm 1.065091221064e-05 </div><div>    1 KSP Residual norm 8.245135443599e-16 </div><div>  3 SNES Function norm 2.200530918322e-10 </div>
<div>    0 KSP Residual norm 7.730316189302e-11 </div><div>    1 KSP Residual norm 1.115126544733e-21 </div><div>  4 SNES Function norm 2.192093087025e-10 </div></div><div><br></div><div>It changes the values again just enough to push it to the right side of the convergence check.</div>
<div><br></div><div>I am still looking into the problems for 2 processes with GPU, it seems to somewhere be using old data as you can see by the fact that the function norm is the same at the beginning of each SNES iteration</div>
<div><br></div><div>GPU, 2 processes</div><div><meta charset="utf-8"><span class="Apple-style-span" style="border-collapse: collapse; color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; ">[agraiver@tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535 -da_vec_type cusp -snes_monitor -ksp_monitor<div class="im" style="color: rgb(80, 0, 80); ">
<br>  0 SNES Function norm 3.906279802209e-03<-----<br>    0 KSP Residual norm 5.994156809227e+00<br></div><div class="im" style="color: rgb(80, 0, 80); ">    1 KSP Residual norm 5.927247846249e-05<br>  1 SNES Function norm 3.906225077938e-03<------<br>
    0 KSP Residual norm 5.993813868985e+00<br>    1 KSP Residual norm 5.927575078206e-05</div></span></div><div><br></div><div>So, it's doing some good calculations and then throwing them away and starting over again.  I will continue to look into this.</div>
<div><br></div><div>Cheers,</div><div><br></div><div>Victor</div><div><br></div>---<br>Victor L. Minden<br><br>Tufts University<br>School of Engineering<br>Class of 2012<br>
<br><br><div class="gmail_quote">On Wed, May 11, 2011 at 8:31 AM, Alexander Grayver <span dir="ltr"><<a href="mailto:agrayver@gfz-potsdam.de">agrayver@gfz-potsdam.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


  
    
  
  <div bgcolor="#ffffff" text="#000000">
    Hello,<br>
    <br>
    Victor thanks. We've got last version and now it doesn't crash.
    However it seems there is still problem.<br>
    <br>
    Let's look at three different runs:<br>
    <br>
    [agraiver@tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535
    -snes_monitor -ksp_monitor<div class="im"><br>
      0 SNES Function norm 3.906279802209e-03<br>
        0 KSP Residual norm 5.994156809227e+00<br>
        1 KSP Residual norm 3.538158441448e-04<br>
        2 KSP Residual norm 3.124431921666e-04<br>
        3 KSP Residual norm 4.109213410989e-06<br>
      1 SNES Function norm 7.201017610490e-04<br>
        0 KSP Residual norm 3.317803708316e-02<br>
        1 KSP Residual norm 2.447380361169e-06<br>
        2 KSP Residual norm 2.164193969957e-06<br>
        3 KSP Residual norm 2.124317398679e-08<br>
      2 SNES Function norm 1.719678934825e-05<br>
        0 KSP Residual norm 1.651586453143e-06<br>
        1 KSP Residual norm 2.037037536868e-08<br>
        2 KSP Residual norm 1.109736798274e-08<br>
        3 KSP Residual norm 1.857218772156e-12<br>
      3 SNES Function norm 1.159391068583e-09<br>
        0 KSP Residual norm 3.116936044619e-11<br>
        1 KSP Residual norm 1.366503312678e-12<br>
        2 KSP Residual norm 6.598477672192e-13<br>
        3 KSP Residual norm 5.306147277879e-17<br>
      4 SNES Function norm 2.202297235559e-10<br></div>
    [agraiver@tesla-cmc new]$ mpirun -np 1 ./lapexp -da_grid_x 65535
    -da_vec_type cusp -snes_monitor -ksp_monitor<div class="im"><br>
      0 SNES Function norm 3.906279802209e-03<br></div><div class="im">
        0 KSP Residual norm 2.600060425819e+01<br>
        1 KSP Residual norm 1.711173401491e-09<br>
      1 SNES Function norm 2.518839283204e-05<br>
        0 KSP Residual norm 1.864270712051e-01<br>
        1 KSP Residual norm 1.123567613474e-11<br>
      2 SNES Function norm 1.475752536169e-09<br>
        0 KSP Residual norm 1.065095925089e-05<br>
        1 KSP Residual norm 8.918344224261e-16<br>
      3 SNES Function norm 2.186342855894e-10<br></div>
        0 KSP Residual norm 6.313874615230e-11<br>
        1 KSP Residual norm 2.338370003621e-21<br>
    [agraiver@tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535
    -da_vec_type cusp -snes_monitor -ksp_monitor<div class="im"><br>
      0 SNES Function norm 3.906279802209e-03<br>
        0 KSP Residual norm 5.994156809227e+00<br></div><div class="im">
        1 KSP Residual norm 5.927247846249e-05<br>
      1 SNES Function norm 3.906225077938e-03<br>
        0 KSP Residual norm 5.993813868985e+00<br>
        1 KSP Residual norm 5.927575078206e-05<br></div>
    [agraiver@tesla-cmc new]$<br>
    <br>
    lepexp is the default example, just renamed. The first run used 2
    CPUs, the second one used 1 GPU and the third one ran with 2
    processes using 1 GPU.<br>
    First different is that when use cpu the last string in output is
    always:<div class="im"><br>
    4 SNES Function norm 2.202297235559e-10<br></div>
    whereas for CPU the last string is "N KSP ...something..."<br>
    Then is seems that for 2 processes using 1 GPU example doesn't
    converge, the norm is quite big. The same situation happens when we
    use 2 process and 2 GPUs. Can you explain this?<br>
    BTW, we can even give you access to our server with 6 CPUs and 8
    GPUs within one node. <br>
    <br>
    Regards,<br><font color="#888888">
    Alexander</font><div><div></div><div class="h5"><br>
    <br>
    On 11.05.2011 01:07, Victor Minden wrote:
    <blockquote type="cite">I pushed my change to petsc-dev, so hopefully a new
      pull of the latest mercurial repository should do it, let me know
      if not.<br clear="all">
      ---<br>
      Victor L. Minden<br>
      <br>
      Tufts University<br>
      School of Engineering<br>
      Class of 2012<br>
      <br>
      <br>
      <div class="gmail_quote">On Tue, May 10, 2011 at 6:59 PM,
        Alexander Grayver <span dir="ltr"><<a href="mailto:agrayver@gfz-potsdam.de" target="_blank">agrayver@gfz-potsdam.de</a>></span>
        wrote:<br>
        <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex">
          <div bgcolor="#ffffff" text="#000000"> Hi Victor,<br>
            <br>
            Thanks a lot!<br>
            What should we do to get new version?<br>
            <br>
            Regards,<br>
            <font color="#888888"> Alexander</font>
            <div>
              <div><br>
                <br>
                On 10.05.2011 02:02, Victor Minden wrote:
                <blockquote type="cite">I believe I've resolved this
                  issue.
                  <div><br>
                  </div>
                  <div>Cheers,</div>
                  <div><br>
                  </div>
                  <div>Victor<br clear="all">
                    ---<br>
                    Victor L. Minden<br>
                    <br>
                    Tufts University<br>
                    School of Engineering<br>
                    Class of 2012<br>
                    <br>
                    <br>
                    <div class="gmail_quote">On Sun, May 8, 2011 at 5:26
                      PM, Victor Minden <span dir="ltr"><<a href="mailto:victorminden@gmail.com" target="_blank">victorminden@gmail.com</a>></span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204, 204, 204);padding-left:1ex"> Barry,<br>
                        <br>
                        I can verify this on breadboard now,<br>
                        <br>
                        with two processes, cuda<br>
                        <br>
minden@bb45:~/petsc-dev/src/snes/examples/tutorials$<br>
                        /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra

                        -machinefile<br>
                        /home/balay/machinefile -n 2 ./ex47cu -da_grid_x
                        65535 -log_summary<br>
                        -snes_monitor -ksp_monitor -da_vec_type cusp<br>
                        <div>  0 SNES Function norm 3.906279802209e-03<br>
                             0 KSP Residual norm 5.994156809227e+00<br>
                             1 KSP Residual norm 5.927247846249e-05<br>
                           1 SNES Function norm 3.906225077938e-03<br>
                             0 KSP Residual norm 5.993813868985e+00<br>
                             1 KSP Residual norm 5.927575078206e-05<br>
                        </div>
                        <div>terminate called after throwing an instance
                          of 'thrust::system::system_error'<br>
                           what():  invalid device pointer<br>
                          terminate called after throwing an instance of
                          'thrust::system::system_error'<br>
                           what():  invalid device pointer<br>
                        </div>
                        Aborted (signal 6)<br>
                        <br>
                        <br>
                        <br>
                        Without cuda<br>
                        <br>
minden@bb45:~/petsc-dev/src/snes/examples/tutorials$<br>
                        /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra

                        -machinefile<br>
                        /home/balay/machinefile -n 2 ./ex47cu -da_grid_x
                        65535 -log_summary<br>
                        <div>-snes_monitor -ksp_monitor<br>
                           0 SNES Function norm 3.906279802209e-03<br>
                             0 KSP Residual norm 5.994156809227e+00<br>
                             1 KSP Residual norm 3.538158441448e-04<br>
                             2 KSP Residual norm 3.124431921666e-04<br>
                             3 KSP Residual norm 4.109213410989e-06<br>
                           1 SNES Function norm 7.201017610490e-04<br>
                             0 KSP Residual norm 3.317803708316e-02<br>
                             1 KSP Residual norm 2.447380361169e-06<br>
                             2 KSP Residual norm 2.164193969957e-06<br>
                             3 KSP Residual norm 2.124317398679e-08<br>
                           2 SNES Function norm <a href="tel:1.7196789348" value="+17196789348" target="_blank">1.7196789348</a>25e-05<br>
                             0 KSP Residual norm <a href="tel:1.6515864531" value="+16515864531" target="_blank">1.6515864531</a>43e-06<br>
                             1 KSP Residual norm 2.037037536868e-08<br>
                             2 KSP Residual norm 1.109736798274e-08<br>
                             3 KSP Residual norm <a href="tel:1.8572187721" value="+18572187721" target="_blank">1.8572187721</a>56e-12<br>
                           3 SNES Function norm 1.159391068583e-09<br>
                             0 KSP Residual norm 3.116936044619e-11<br>
                             1 KSP Residual norm 1.366503312678e-12<br>
                             2 KSP Residual norm 6.598477672192e-13<br>
                             3 KSP Residual norm 5.306147277879e-17<br>
                           4 SNES Function norm 2.202297235559e-10<br>
                          <br>
                        </div>
                        Note the repeated norms when using cuda.  Looks
                        like I'll have to take<br>
                        a closer look at this.<br>
                        <br>
                        -Victor<br>
                        <br>
                        ---<br>
                        Victor L. Minden<br>
                        <br>
                        Tufts University<br>
                        School of Engineering<br>
                        Class of 2012<br>
                        <div>
                          <div><br>
                            <br>
                            <br>
                            On Thu, May 5, 2011 at 2:57 PM, Barry Smith
                            <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>>
                            wrote:<br>
                            ><br>
                            > Alexander<br>
                            ><br>
                            >    Thank you for the sample code; it
                            will be very useful.<br>
                            ><br>
                            >    We have run parallel jobs with CUDA
                            where each node has only a single MPI
                            process and uses a single GPU without the
                            crash that you get below. I cannot explain
                            why it would not work in your situation. Do
                            you have access to two nodes each with a GPU
                            so you could try that?<br>
                            ><br>
                            >   It is crashing in a delete of a<br>
                            ><br>
                            > struct  _p_PetscCUSPIndices {<br>
                            >  CUSPINTARRAYCPU indicesCPU;<br>
                            >  CUSPINTARRAYGPU indicesGPU;<br>
                            > };<br>
                            ><br>
                            > where
                            cusp::array1d<PetscInt,cusp::device_memory><br>
                            ><br>
                            > thus it is crashing after it has
                            completed actually doing the computation. If
                            you run with -snes_monitor -ksp_monitor with
                            and without the -da_vec_type cusp on 2
                            processes what do you get for output in the
                            two cases? I want to see if it is running
                            correctly on two processes?<br>
                            ><br>
                            > Could the crash be due to memory
                            corruption sometime doing the computation?<br>
                            ><br>
                            ><br>
                            >   Barry<br>
                            ><br>
                            ><br>
                            ><br>
                            ><br>
                            ><br>
                            > On May 5, 2011, at 3:38 AM, Alexander
                            Grayver wrote:<br>
                            ><br>
                            >> Hello!<br>
                            >><br>
                            >> We work with petsc-dev branch and <a href="http://ex47cu.cu" target="_blank">ex47cu.cu</a>
                            example. Our platform is<br>
                            >> Intel Quad processor and 8
                            identical Tesla GPUs. CUDA 3.2 toolkit is<br>
                            >> installed.<br>
                            >> Ideally we would like to make petsc
                            working in a multi-GPU way within<br>
                            >> just one node so that different
                            GPUs could be attached to different<br>
                            >> processes.<br>
                            >> Since it's not possible within
                            current PETSc implementation we created a<br>
                            >> preload library (see LD_PRELOAD for
                            details) for CUBLAS function<br>
                            >> cublasInit().<br>
                            >> When PETSc calls this function our
                            library gets control and we assign<br>
                            >> GPUs according to rank within MPI
                            communicator, then we call original<br>
                            >> cublasInit().<br>
                            >> This preload library is very
                            simple, see petsc_mgpu.c attached.<br>
                            >> This trick makes each process to
                            have its own context and ideally all<br>
                            >> computations should be distributed
                            over several GPUs.<br>
                            >><br>
                            >> We managed to build petsc and
                            example (see makefile attached) and we<br>
                            >> tested it as follows:<br>
                            >><br>
                            >> [agraiver@tesla-cmc new]$ ./lapexp
                            -da_grid_x 65535 -info > cpu_1process.out<br>
                            >> [agraiver@tesla-cmc new]$ mpirun
                            -np 2 ./lapexp -da_grid_x 65535 -info ><br>
                            >> cpu_2processes.out<br>
                            >> [agraiver@tesla-cmc new]$ ./lapexp
                            -da_grid_x 65535 -da_vec_type cusp<br>
                            >> -info > gpu_1process.out<br>
                            >> [agraiver@tesla-cmc new]$ mpirun
                            -np 2 ./lapexp -da_grid_x 65535<br>
                            >> -da_vec_type cusp -info >
                            gpu_2processes.out<br>
                            >><br>
                            >> Everything except last
                            configuration works well. The last one
                            crashes<br>
                            >> with the following exception and
                            callstack:<br>
                            >> terminate called after throwing an
                            instance of<br>
                            >> 'thrust::system::system_error'<br>
                            >>   what():  invalid device pointer<br>
                            >> [tesla-cmc:15549] *** Process
                            received signal ***<br>
                            >> [tesla-cmc:15549] Signal: Aborted
                            (6)<br>
                            >> [tesla-cmc:15549] Signal code:
                             (-6)<br>
                            >> [tesla-cmc:15549] [ 0]
                            /lib64/libpthread.so.0() [0x3de540eeb0]<br>
                            >> [tesla-cmc:15549] [ 1]
                            /lib64/libc.so.6(gsignal+0x35)
                            [0x3de50330c5]<br>
                            >> [tesla-cmc:15549] [ 2]
                            /lib64/libc.so.6(abort+0x186) [0x3de5034a76]<br>
                            >> [tesla-cmc:15549] [ 3]<br>
                            >>
/opt/llvm/dragonegg/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d)<br>
                            >> [0x7f0d3530b95d]<br>
                            >> [tesla-cmc:15549] [ 4]<br>
                            >>
                            /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7b76)
                            [0x7f0d35309b76]<br>
                            >> [tesla-cmc:15549] [ 5]<br>
                            >>
                            /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7ba3)
                            [0x7f0d35309ba3]<br>
                            >> [tesla-cmc:15549] [ 6]<br>
                            >>
                            /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7cae)
                            [0x7f0d35309cae]<br>
                            >> [tesla-cmc:15549] [ 7]<br>
                            >>
./lapexp(_ZN6thrust6detail6device4cuda4freeILj0EEEvNS_10device_ptrIvEE+0x69)<br>
                            >> [0x426320]<br>
                            >> [tesla-cmc:15549] [ 8]<br>
                            >>
./lapexp(_ZN6thrust6detail6device8dispatch4freeILj0EEEvNS_10device_ptrIvEENS0_21cuda_device_space_tagE+0x2b)<br>
                            >> [0x4258b2]<br>
                            >> [tesla-cmc:15549] [ 9]<br>
                            >>
                            ./lapexp(_ZN6thrust11device_freeENS_10device_ptrIvEE+0x2f)
                            [0x424f78]<br>
                            >> [tesla-cmc:15549] [10]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust23device_malloc_allocatorIiE10deallocateENS_10device_ptrIiEEm+0x33)<br>
                            >> [0x7f0d36aeacff]<br>
                            >> [tesla-cmc:15549] [11]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEE10deallocateEv+0x6e)<br>
                            >> [0x7f0d36ae8e78]<br>
                            >> [tesla-cmc:15549] [12]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEED1Ev+0x19)<br>
                            >> [0x7f0d36ae75f7]<br>
                            >> [tesla-cmc:15549] [13]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail11vector_baseIiNS_23device_malloc_allocatorIiEEED1Ev+0x52)<br>
                            >> [0x7f0d36ae65f4]<br>
                            >> [tesla-cmc:15549] [14]<br>
                            >>
/opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN4cusp7array1dIiN6thrust6detail21cuda_device_space_tagEED1Ev+0x18)<br>
                            >> [0x7f0d36ae5c2e]<br>
                            >> [tesla-cmc:15549] [15]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN19_p_PetscCUSPIndicesD1Ev+0x1d)

                            [0x7f0d3751e45f]<br>
                            >> [tesla-cmc:15549] [16]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(PetscCUSPIndicesDestroy+0x20f)<br>
                            >> [0x7f0d3750c840]<br>
                            >> [tesla-cmc:15549] [17]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy_PtoP+0x1bc8)<br>
                            >> [0x7f0d375af8af]<br>
                            >> [tesla-cmc:15549] [18]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy+0x586)<br>
                            >> [0x7f0d375e9ddf]<br>
                            >> [tesla-cmc:15549] [19]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy_MPIAIJ+0x49f)<br>
                            >> [0x7f0d37191d24]<br>
                            >> [tesla-cmc:15549] [20]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy+0x546)
                            [0x7f0d370d54fe]<br>
                            >> [tesla-cmc:15549] [21]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESReset+0x5d1)
                            [0x7f0d3746fac3]<br>
                            >> [tesla-cmc:15549] [22]<br>
                            >>
                            /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESDestroy+0x4b8)
                            [0x7f0d37470210]<br>
                            >> [tesla-cmc:15549] [23]
                            ./lapexp(main+0x5ed) [0x420745]<br>
                            >><br>
                            >> I've sent all detailed output files
                            for different execution<br>
                            >> configuration listed above as well
                            as configure.log and make.log to<br>
                            >> <a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>
                            hoping that someone could recognize the
                            problem.<br>
                            >> Now we have one node with
                            multi-GPU, but I'm also wondering if someone<br>
                            >> really tested usage of GPU
                            functionality over several nodes with one
                            GPU<br>
                            >> each?<br>
                            >><br>
                            >> Regards,<br>
                            >> Alexander<br>
                            >><br>
                            >>
                            <petsc_mgpu.c><makefile.txt><configure.log><br>
                            ><br>
                            ><br>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                    <br>
                  </div>
                </blockquote>
                <br>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>