<div>works!</div><div><br></div><div><div>SCRGP2$ make ex52</div><div>/usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC -I/opt/apps/PETSC/petsc-dev/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c</div>
<div>nvcc -G -O0 -g -arch=sm_10 -c --compiler-options="-O0 -g -fPIC -I/opt/apps/PETSC/petsc-dev/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/" ex52_integrateElement.cu</div>
<div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared but never referenced</div>
<div><br></div><div>/usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib -lpetsc -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib -ltriangle -lX11 -lpthread -lmetis -Wl,-rpath,/opt/apps/cuda/4.0/cuda/lib64 -L/opt/apps/cuda/4.0/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl</div>
<div><br></div><div><br></div><div><div>SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu</div><div>GPU layout grid(1,2,1) block(3,1,1) with 1 batches</div><div> N_t: 3, N_cb: 1</div><div>Residual:</div><div>
Vector Object: 1 MPI processes</div><div> type: seq</div><div>-0.25</div><div>-0.5</div><div>0.25</div><div>-0.5</div><div>-1</div><div>0.5</div><div>0.25</div><div>0.5</div><div>0.75</div><div>SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch</div>
<div>Residual:</div><div>Vector Object: 1 MPI processes</div><div> type: seq</div><div>-0.25</div><div>-0.5</div><div>0.25</div><div>-0.5</div><div>-1</div><div>0.5</div><div>0.25</div><div>0.5</div><div>0.75</div></div>
<div><br></div></div><div><br></div><div><br></div><br><br><div class="gmail_quote">On Wed, Mar 28, 2012 at 1:37 PM, David Fuentes <span dir="ltr"><<a href="mailto:fuentesdt@gmail.com">fuentesdt@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>sure. will do.</div><div class="HOEnZb"><div class="h5"><br><br><div class="gmail_quote">On Wed, Mar 28, 2012 at 1:23 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On Wed, Mar 28, 2012 at 1:14 PM, David Fuentes <span dir="ltr"><<a href="mailto:fuentesdt@gmail.com" target="_blank">fuentesdt@gmail.com</a>></span> wrote:<br></div><div class="gmail_quote"><div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>thanks! its running, but I seem to be getting different answer for cpu/gpu ?</div><div>i had some floating point problems on this Tesla M2070 gpu before, but adding the '-arch=sm_20' option seemed to fix it last time.</div>
<div><br></div><div><br></div><div>is the assembly in single precision ? my 'const PetscReal jacobianInverse' being passed in are doubles</div></blockquote><div><br></div></div><div>Yep, that is the problem. I have not tested anything in double. I have not decided exactly how to handle it. Can you</div>
<div>make another ARCH --with-precision=single and make sure it works, and then we can fix the double issue?</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div><div>SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu</div></div><div>GPU layout grid(1,2,1) block(3,1,1) with 1 batches</div><div> N_t: 3, N_cb: 1</div><div><div>Residual:</div>
<div>Vector Object: 1 MPI processes</div>
<div> type: seq</div></div><div>0</div><div>755712</div><div>0</div><div>-58720</div><div>-2953.13</div><div>0.375</div><div>1.50323e+07</div><div>0.875</div><div>0</div><div>SCRGP2$</div><div><div>SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch</div>
<div>Residual:</div><div>Vector Object: 1 MPI processes</div><div> type: seq</div><div>-0.25</div><div>-0.5</div><div>0.25</div><div>-0.5</div><div>-1</div><div>0.5</div><div>0.25</div><div>0.5</div><div>0.75</div></div>
</div><div><div>
<div><br></div><div><br></div><div><br></div><br><br><div class="gmail_quote">On Wed, Mar 28, 2012 at 11:55 AM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Wed, Mar 28, 2012 at 11:45 AM, David Fuentes <span dir="ltr"><<a href="mailto:fuentesdt@gmail.com" target="_blank">fuentesdt@gmail.com</a>></span> wrote:<br>
</div><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>The example seems to be running on cpu with '-batch' but i'm getting errors in line 323 with the '-gpu' option</div><div><br></div><div>[0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in src/snes/examples/tutorials/ex52_integrateElement.cu
</div><div><br></div><div>should this possibly be PetscScalar ? </div></blockquote><div><br></div></div><div>No.</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div>- ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt * sizeof(float));CHKERRQ(ierr);</div><div>+ ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt * sizeof(PetscScalar));CHKERRQ(ierr);</div>
</div><div><br></div><div><br></div><div><div>SCRGP2$ python $PETSC_DIR/bin/pythonscripts/PetscGenerateFEMQuadrature.py 2 1 1 1 laplacian ex52.h</div><div>['/opt/apps/PETSC/petsc-dev/bin/pythonscripts/PetscGenerateFEMQuadrature.py', '2', '1', '1', '1', 'laplacian', 'ex52.h']</div>
<div>2 1 1 1 laplacian</div><div>[{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0, 1.0): [(1.0, ())]}]</div><div>{0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}}</div><div>Perm: [0, 1, 2]</div>
<div>Creating /home/fuentes/snestutorial/ex52.h</div><div>Creating /home/fuentes/snestutorial/ex52_gpu.h</div><div>[{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0, 1.0): [(1.0, ())]}]</div><div>{0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}}</div>
<div>Perm: [0, 1, 2]</div><div>Creating /home/fuentes/snestutorial/ex52_gpu_inline.h</div></div><div><br></div><div><div>SCRGP2$ make ex52</div><div>/usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC -I/opt/apps/PETSC/petsc-dev/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include -I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve -I/opt/MATLAB/R2011a/extern/include -I/usr/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c</div>
<div>nvcc -O0 -g -arch=sm_20 -c --compiler-options="-O0 -g -fPIC -I/opt/apps/PETSC/petsc-dev/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include -I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve -I/opt/MATLAB/R2011a/extern/include -I/usr/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/" ex52_integrateElement.cu</div>
<div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but never referenced</div>
<div><br></div><div>ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never referenced</div><div><br></div><div>ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared but never referenced</div>
<div><br></div><div>/usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib -lpetsc -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib -ltriangle -lX11 -lpthread -lsuperlu_dist_3.0 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -Wl,-rpath,/opt/apps/cuda/4.1/cuda/lib64 -L/opt/apps/cuda/4.1/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse -Wl,-rpath,/opt/MATLAB/R2011a/sys/os/glnxa64:/opt/MATLAB/R2011a/bin/glnxa64:/opt/MATLAB/R2011a/extern/lib/glnxa64 -L/opt/MATLAB/R2011a/bin/glnxa64 -L/opt/MATLAB/R2011a/extern/lib/glnxa64 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -lexoIIv2for -lexodus -lnetcdf_c++ -lnetcdf -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl </div>
<div>/bin/rm -f ex52.o ex52_integrateElement.o</div></div><div><br></div><div><br></div><div><div>SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch </div><div>Residual:</div><div>Vector Object: 1 MPI processes</div>
<div> type: seq</div><div>-0.25</div><div>-0.5</div><div>0.25</div><div>-0.5</div><div>-1</div><div>0.5</div><div>0.25</div><div>0.5</div><div>0.75</div><div>SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu</div>
<div>[0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in src/snes/examples/tutorials/ex52_integrateElement.cu</div><div>[0]PETSC ERROR: FormFunctionLocalBatch() line 679 in src/snes/examples/tutorials/ex52.c</div><div>
[0]PETSC ERROR: SNESDMComplexComputeFunction() line 431 in src/snes/utils/damgsnes.c</div><div>[0]PETSC ERROR: main() line 1021 in src/snes/examples/tutorials/ex52.c</div><div>application called MPI_Abort(MPI_COMM_WORLD, 35) - process 0</div>
</div></blockquote><div><br></div></div></div><div>This is failing on cudaMalloc(), which means your card is not available for running. Are you trying to run on your laptop?</div><div>If so, applications like Preview can lock up the GPU. I know of no way to test this in CUDA while running. I just close</div>
<div>apps until it runs.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div><br><div class="gmail_quote">On Tue, Mar 27, 2012 at 8:37 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>On Tue, Mar 27, 2012 at 2:10 PM, Blaise Bourdin <span dir="ltr"><<a href="mailto:bourdin@lsu.edu" target="_blank">bourdin@lsu.edu</a>></span> wrote:<br></div><div class="gmail_quote"><div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><br><div><div><div>On Mar 27, 2012, at 1:23 PM, Matthew Knepley wrote:</div><br><blockquote type="cite">On Tue, Mar 27, 2012 at 12:58 PM, David Fuentes <span dir="ltr"><<a href="mailto:fuentesdt@gmail.com" target="_blank">fuentesdt@gmail.com</a>></span> wrote:<br>
<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>Hi, </div><div><br></div><div>I had a question about the status of example 52.</div><div><br></div><a href="http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52.c" target="_blank">http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52.c</a><div>
<a href="http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52_integrateElement.cu" target="_blank">http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52_integrateElement.cu</a> <br>
<div><br></div><div>Can this example be used with a DM object created from an unstructured exodusII mesh, DMMeshCreateExodus, And the FEM assembly done on GPU ?</div></div></blockquote><div><br></div><div>1) I have pushed many more tests for it now. They can be run using the Python build system</div>
<div><br></div><div> ./config/builder2.py check src/snes/examples/tutorials/ex52.c</div><div><br></div><div> in fact, you can build any set of files this way.</div><div><br></div><div>2) The Exodus creation has to be converted to DMComplex from DMMesh. That should not take me very long. Blaise maintains that</div>
<div> so maybe there will be help :) You will just replace DMComplexCreateBoxMesh() with DMComplexCreateExodus(). If you request</div><div> it, I will bump it up the list.</div></div></blockquote><div><br></div></div>
<div>DMMeshCreateExodusNG is much more flexible than DMMeshCreateExodus in that it can read meshes with multiple element types and should have a much lower memory footprint. The code should be fairly easy to read. you can email me directly if you have specific questions. I had looked at creating a DMComplex and it did not look too difficult, as long as interpolation is not needed. I have plans to write DMComplexCreateExodus, but haven't had time too so far. Updating the Vec viewers and readers may be a bit more involved. In perfect world, one would write an EXODUS viewer following the lines of the VTK and HDF5 ones. </div>
</div></div></blockquote><div><br></div></div><div>David and Blaise, I have converted this function, now DMComplexCreateExodus(). Its not tested, but I think</div><div>Blaise has some stuff we can use to test it.</div><div>
<br>
</div><div> Thanks,</div><div><br></div><div> Matt</div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">
<div><div>Blaise</div>
<div><div><br></div><br><blockquote type="cite"><div class="gmail_quote"><div><br></div><div>Let me know if you can run the tests.</div>
<div><br></div><div> Thanks</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div>Thanks,</div><div>David</div>
</div></div></blockquote></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>
</blockquote></div></div><span><font color="#888888"><br><div>
<span style="font-size:12px"><div style="word-wrap:break-word"><span style="text-indent:0px;letter-spacing:normal;font-variant:normal;font-style:normal;font-weight:normal;line-height:normal;border-collapse:separate;text-transform:none;font-size:12px;white-space:normal;font-family:Helvetica;word-spacing:0px"><div style="word-wrap:break-word">
<span style="text-indent:0px;letter-spacing:normal;font-variant:normal;font-style:normal;font-weight:normal;line-height:normal;border-collapse:separate;text-transform:none;font-size:12px;white-space:normal;font-family:Helvetica;word-spacing:0px"><div style="word-wrap:break-word">
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px">-- </div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px">Department of Mathematics and Center for Computation & Technology</div>
<div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px">Louisiana State University, Baton Rouge, LA 70803, USA</div><div style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px">Tel. <a href="tel:%2B1%20%28225%29%20578%201612" value="+12255781612" target="_blank">+1 (225) 578 1612</a>, Fax <a href="tel:%2B1%20%28225%29%20578%204276" value="+12255784276" target="_blank">+1 (225) 578 4276</a> <a href="http://www.math.lsu.edu/~bourdin" target="_blank">http://www.math.lsu.edu/~bourdin</a></div>
<div><br></div><div><br></div><br></div></span></div></span><br></div></span><br><br>
</div>
<br></font></span></div></blockquote></div></div><div><div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</div></div></blockquote></div><br>
</div></div></blockquote></div></div></div><div><div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</div></div></blockquote></div><br>
</div></div></blockquote></div></div></div><div><div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</div></div></blockquote></div><br>
</div></div></blockquote></div><br>