[petsc-users] ex52_integrateElement.cu
Matthew Knepley
knepley at gmail.com
Wed Mar 28 16:57:45 CDT 2012
On Wed, Mar 28, 2012 at 4:12 PM, David Fuentes <fuentesdt at gmail.com> wrote:
> works!
>
Excellent. Now, my thinking was that GPUs are most useful doing single
work, but
I can see the utility of double accuracy for a residual.
My inclination is to define another type, say GPUReal, and use it for all
kernels.
Would that do what you want?
Matt
> SCRGP2$ make ex52
> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC
> -I/opt/apps/PETSC/petsc-dev/include
> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include
> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2
> -D__INSDIR__=src/snes/examples/tutorials/ ex52.c
> nvcc -G -O0 -g -arch=sm_10 -c --compiler-options="-O0 -g -fPIC
> -I/opt/apps/PETSC/petsc-dev/include
> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include
> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2
> -D__INSDIR__=src/snes/examples/tutorials/" ex52_integrateElement.cu
> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but
> never referenced
>
> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared
> but never referenced
>
> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but
> never referenced
>
> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
> referenced
>
> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared
> but never referenced
>
> /usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o
> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
> -lpetsc
> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
> -ltriangle -lX11 -lpthread -lmetis -Wl,-rpath,/opt/apps/cuda/4.0/cuda/lib64
> -L/opt/apps/cuda/4.0/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse
> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib
> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5
> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3
> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt
> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl
>
>
> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu
> GPU layout grid(1,2,1) block(3,1,1) with 1 batches
> N_t: 3, N_cb: 1
> Residual:
> Vector Object: 1 MPI processes
> type: seq
> -0.25
> -0.5
> 0.25
> -0.5
> -1
> 0.5
> 0.25
> 0.5
> 0.75
> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
> Residual:
> Vector Object: 1 MPI processes
> type: seq
> -0.25
> -0.5
> 0.25
> -0.5
> -1
> 0.5
> 0.25
> 0.5
> 0.75
>
>
>
>
>
> On Wed, Mar 28, 2012 at 1:37 PM, David Fuentes <fuentesdt at gmail.com>wrote:
>
>> sure. will do.
>>
>>
>> On Wed, Mar 28, 2012 at 1:23 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>
>>> On Wed, Mar 28, 2012 at 1:14 PM, David Fuentes <fuentesdt at gmail.com>wrote:
>>>
>>>> thanks! its running, but I seem to be getting different answer for
>>>> cpu/gpu ?
>>>> i had some floating point problems on this Tesla M2070 gpu before, but
>>>> adding the '-arch=sm_20' option seemed to fix it last time.
>>>>
>>>>
>>>> is the assembly in single precision ? my 'const PetscReal
>>>> jacobianInverse' being passed in are doubles
>>>>
>>>
>>> Yep, that is the problem. I have not tested anything in double. I have
>>> not decided exactly how to handle it. Can you
>>> make another ARCH --with-precision=single and make sure it works, and
>>> then we can fix the double issue?
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>
>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu
>>>> GPU layout grid(1,2,1) block(3,1,1) with 1 batches
>>>> N_t: 3, N_cb: 1
>>>> Residual:
>>>> Vector Object: 1 MPI processes
>>>> type: seq
>>>> 0
>>>> 755712
>>>> 0
>>>> -58720
>>>> -2953.13
>>>> 0.375
>>>> 1.50323e+07
>>>> 0.875
>>>> 0
>>>> SCRGP2$
>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>> Residual:
>>>> Vector Object: 1 MPI processes
>>>> type: seq
>>>> -0.25
>>>> -0.5
>>>> 0.25
>>>> -0.5
>>>> -1
>>>> 0.5
>>>> 0.25
>>>> 0.5
>>>> 0.75
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 28, 2012 at 11:55 AM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>
>>>>> On Wed, Mar 28, 2012 at 11:45 AM, David Fuentes <fuentesdt at gmail.com>wrote:
>>>>>
>>>>>> The example seems to be running on cpu with '-batch' but i'm getting
>>>>>> errors in line 323 with the '-gpu' option
>>>>>>
>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in
>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>
>>>>>> should this possibly be PetscScalar ?
>>>>>>
>>>>>
>>>>> No.
>>>>>
>>>>>
>>>>>> - ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt *
>>>>>> sizeof(float));CHKERRQ(ierr);
>>>>>> + ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt *
>>>>>> sizeof(PetscScalar));CHKERRQ(ierr);
>>>>>>
>>>>>>
>>>>>> SCRGP2$ python
>>>>>> $PETSC_DIR/bin/pythonscripts/PetscGenerateFEMQuadrature.py 2 1 1 1
>>>>>> laplacian ex52.h
>>>>>> ['/opt/apps/PETSC/petsc-dev/bin/pythonscripts/PetscGenerateFEMQuadrature.py',
>>>>>> '2', '1', '1', '1', 'laplacian', 'ex52.h']
>>>>>> 2 1 1 1 laplacian
>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0,
>>>>>> 1.0): [(1.0, ())]}]
>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}}
>>>>>> Perm: [0, 1, 2]
>>>>>> Creating /home/fuentes/snestutorial/ex52.h
>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu.h
>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0,
>>>>>> 1.0): [(1.0, ())]}]
>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}}
>>>>>> Perm: [0, 1, 2]
>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu_inline.h
>>>>>>
>>>>>> SCRGP2$ make ex52
>>>>>> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC
>>>>>> -I/opt/apps/PETSC/petsc-dev/include
>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
>>>>>> -I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve
>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include
>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
>>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c
>>>>>> nvcc -O0 -g -arch=sm_20 -c --compiler-options="-O0 -g -fPIC
>>>>>> -I/opt/apps/PETSC/petsc-dev/include
>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
>>>>>> -I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve
>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include
>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
>>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/"
>>>>>> ex52_integrateElement.cu
>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>>>>>> declared but never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>> never referenced
>>>>>>
>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>>>>>> declared but never referenced
>>>>>>
>>>>>> /usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o
>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
>>>>>> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib -lpetsc
>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
>>>>>> -ltriangle -lX11 -lpthread -lsuperlu_dist_3.0 -lcmumps -ldmumps -lsmumps
>>>>>> -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs
>>>>>> -Wl,-rpath,/opt/apps/cuda/4.1/cuda/lib64 -L/opt/apps/cuda/4.1/cuda/lib64
>>>>>> -lcufft -lcublas -lcudart -lcusparse
>>>>>> -Wl,-rpath,/opt/MATLAB/R2011a/sys/os/glnxa64:/opt/MATLAB/R2011a/bin/glnxa64:/opt/MATLAB/R2011a/extern/lib/glnxa64
>>>>>> -L/opt/MATLAB/R2011a/bin/glnxa64 -L/opt/MATLAB/R2011a/extern/lib/glnxa64
>>>>>> -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc
>>>>>> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib
>>>>>> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -lexoIIv2for -lexodus
>>>>>> -lnetcdf_c++ -lnetcdf -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3
>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt
>>>>>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
>>>>>> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl
>>>>>> /bin/rm -f ex52.o ex52_integrateElement.o
>>>>>>
>>>>>>
>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>>> Residual:
>>>>>> Vector Object: 1 MPI processes
>>>>>> type: seq
>>>>>> -0.25
>>>>>> -0.5
>>>>>> 0.25
>>>>>> -0.5
>>>>>> -1
>>>>>> 0.5
>>>>>> 0.25
>>>>>> 0.5
>>>>>> 0.75
>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu
>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in
>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>> [0]PETSC ERROR: FormFunctionLocalBatch() line 679 in
>>>>>> src/snes/examples/tutorials/ex52.c
>>>>>> [0]PETSC ERROR: SNESDMComplexComputeFunction() line 431 in
>>>>>> src/snes/utils/damgsnes.c
>>>>>> [0]PETSC ERROR: main() line 1021 in src/snes/examples/tutorials/ex52.c
>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 35) - process 0
>>>>>>
>>>>>
>>>>> This is failing on cudaMalloc(), which means your card is not
>>>>> available for running. Are you trying to run on your laptop?
>>>>> If so, applications like Preview can lock up the GPU. I know of no way
>>>>> to test this in CUDA while running. I just close
>>>>> apps until it runs.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Matt
>>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Mar 27, 2012 at 8:37 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>>>
>>>>>>> On Tue, Mar 27, 2012 at 2:10 PM, Blaise Bourdin <bourdin at lsu.edu>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 27, 2012, at 1:23 PM, Matthew Knepley wrote:
>>>>>>>>
>>>>>>>> On Tue, Mar 27, 2012 at 12:58 PM, David Fuentes <
>>>>>>>> fuentesdt at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I had a question about the status of example 52.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52.c
>>>>>>>>>
>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can this example be used with a DM object created from an
>>>>>>>>> unstructured exodusII mesh, DMMeshCreateExodus, And the FEM assembly done
>>>>>>>>> on GPU ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> 1) I have pushed many more tests for it now. They can be run using
>>>>>>>> the Python build system
>>>>>>>>
>>>>>>>> ./config/builder2.py check src/snes/examples/tutorials/ex52.c
>>>>>>>>
>>>>>>>> in fact, you can build any set of files this way.
>>>>>>>>
>>>>>>>> 2) The Exodus creation has to be converted to DMComplex from
>>>>>>>> DMMesh. That should not take me very long. Blaise maintains that
>>>>>>>> so maybe there will be help :) You will just replace
>>>>>>>> DMComplexCreateBoxMesh() with DMComplexCreateExodus(). If you request
>>>>>>>> it, I will bump it up the list.
>>>>>>>>
>>>>>>>>
>>>>>>>> DMMeshCreateExodusNG is much more flexible than DMMeshCreateExodus
>>>>>>>> in that it can read meshes with multiple element types and should have a
>>>>>>>> much lower memory footprint. The code should be fairly easy to read. you
>>>>>>>> can email me directly if you have specific questions. I had looked at
>>>>>>>> creating a DMComplex and it did not look too difficult, as long as
>>>>>>>> interpolation is not needed. I have plans to write DMComplexCreateExodus,
>>>>>>>> but haven't had time too so far. Updating the Vec viewers and readers may
>>>>>>>> be a bit more involved. In perfect world, one would write an EXODUS viewer
>>>>>>>> following the lines of the VTK and HDF5 ones.
>>>>>>>>
>>>>>>>
>>>>>>> David and Blaise, I have converted this function, now
>>>>>>> DMComplexCreateExodus(). Its not tested, but I think
>>>>>>> Blaise has some stuff we can use to test it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>>
>>>>>>>> Blaise
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Let me know if you can run the tests.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Department of Mathematics and Center for Computation & Technology
>>>>>>>> Louisiana State University, Baton Rouge, LA 70803, USA
>>>>>>>> Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276
>>>>>>>> http://www.math.lsu.edu/~bourdin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120328/e31420da/attachment-0001.htm>
More information about the petsc-users
mailing list