[petsc-users] ex52_integrateElement.cu

Wed Apr 25 15:38:08 CDT 2012

thanks! I seem to be getting the same answer on cpu and gpu in 3d w/ double.
However,  increasing the dof with -refinement_limit 0.01, the residual is
differing slightly (see below).

in single precision, do you have an idea of the upper bound limit on
degrees of freedom
such that the sieve gpu FEM solver would provide reasonable accuracy in the
solution
(using gpu for assembly of both residual and jacobain with single precision
solve ) ?
a friend was showing me some plots indicating that floating point error
would become significant around O(10^4) dof on a related problem.
would you expect similar ?


SCRGP2$ hg diff
diff -r be5f9068076c config/PETSc/FEM.py

--- a/config/PETSc/FEM.py       Wed Apr 25 13:40:46 2012 -0500
+++ b/config/PETSc/FEM.py       Wed Apr 25 14:58:40 2012 -0500
@@ -10,7 +10,7 @@
     import os
     self.baseDir    = os.getcwd()
     self.quadDegree = -1
-    self.gpuScalarType = 'float'
+    self.gpuScalarType = 'double'
     return


SCRGP2$ make ex52
/usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC
-I/opt/apps/PETSC/petsc-dev/include
-I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
-I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve
-I/opt/MATLAB/R2011a/extern/include -I/usr/include
-I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
-I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
-I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c
nvcc -O0 -g -arch=sm_20  -c --compiler-options="-O0 -g   -fPIC
 -I/opt/apps/PETSC/petsc-dev/include
-I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
-I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve
-I/opt/MATLAB/R2011a/extern/include -I/usr/include
-I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
-I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
-I/usr/include/mpich2    -D__INSDIR__=src/snes/examples/tutorials/"
 ex52_integrateElement.cu
ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
referenced

ex52_gpu_inline.h(22): warning: variable "Basis_0" was declared but never
referenced

ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
referenced

ex52_gpu_inline.h(14): warning: variable "weights_0" was declared but never
referenced

ex52_gpu_inline.h(22): warning: variable "Basis_0" was declared but never
referenced

ex52_gpu_inline.h(30): warning: variable "BasisDerivatives_0" was declared
but never referenced

ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
referenced

ex52_gpu_inline.h(22): warning: variable "Basis_0" was declared but never
referenced

ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
referenced

ex52_gpu_inline.h(14): warning: variable "weights_0" was declared but never
referenced

ex52_gpu_inline.h(22): warning: variable "Basis_0" was declared but never
referenced

ex52_gpu_inline.h(30): warning: variable "BasisDerivatives_0" was declared
but never referenced

/opt/epd-7.1-2-rh5-x86_64/bin/python
/opt/apps/PETSC/petsc-dev/bin/pythonscripts/PetscGenerateFEMQuadrature.py 3
1 1 1 laplacian ex52.h
[{(-1.0, -1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0, -1.0): [(1.0, ())]},
{(-1.0, 1.0, -1.0): [(1.0, ())]}, {(-1.0, -1.0, 1.0): [(1.0, ())]}]
{0: {0: [0], 1: [1], 2: [2], 3: [3]}, 1: {0: [], 1: [], 2: [], 3: [], 4:
[], 5: []}, 2: {0: [], 1: [], 2: [], 3: []}, 3: {0: []}}
Perm: [0, 1, 2, 3]
Creating /home/fuentes/tutorials/ex52.h
Creating /home/fuentes/tutorials/ex52_gpu.h
[{(-1.0, -1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0, -1.0): [(1.0, ())]},
{(-1.0, 1.0, -1.0): [(1.0, ())]}, {(-1.0, -1.0, 1.0): [(1.0, ())]}]
{0: {0: [0], 1: [1], 2: [2], 3: [3]}, 1: {0: [], 1: [], 2: [], 3: [], 4:
[], 5: []}, 2: {0: [], 1: [], 2: [], 3: []}, 3: {0: []}}
Perm: [0, 1, 2, 3]
Creating /home/fuentes/tutorials/ex52_gpu_inline.h
/usr/bin/mpicxx -O0 -g   -o ex52 ex52.o ex52_integrateElement.o
 -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
-L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib  -lpetsc
-Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
-ltriangle -lX11 -lpthread -lsuperlu_dist_3.0 -lcmumps -ldmumps -lsmumps
-lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs
-Wl,-rpath,/opt/apps/cuda/4.1/cuda/lib64 -L/opt/apps/cuda/4.1/cuda/lib64
-lcufft -lcublas -lcudart -lcusparse
-Wl,-rpath,/opt/MATLAB/R2011a/sys/os/glnxa64:/opt/MATLAB/R2011a/bin/glnxa64:/opt/MATLAB/R2011a/extern/lib/glnxa64
-L/opt/MATLAB/R2011a/bin/glnxa64 -L/opt/MATLAB/R2011a/extern/lib/glnxa64
-leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc
-Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib
-lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -lexoIIv2for -lexodus
-lnetcdf_c++ -lnetcdf -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3
-L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt
-lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
-lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl
/bin/rm -f ex52.o ex52_integrateElement.o


SCRGP2$ ./ex52 -dim 3 -compute_function  -show_residual -batch
-refinement_limit 0.0 -section_view
PetscSection
Process 0:
  (   0) dim  0 offset   0
  (   1) dim  0 offset   0
  (   2) dim  0 offset   0
  (   3) dim  0 offset   0
  (   4) dim  0 offset   0
  (   5) dim  0 offset   0
  (   6) dim  1 offset   0
  (   7) dim  1 offset   1
  (   8) dim  1 offset   2
  (   9) dim  1 offset   3
  (  10) dim  1 offset   4
  (  11) dim  1 offset   5
  (  12) dim  1 offset   6
  (  13) dim  1 offset   7
Residual:
Vector Object: 1 MPI processes
  type: seq
-0.666667
0.166667
0.166667
0.333333
-0.333333
-0.166667
-0.166667
0.666667


SCRGP2$ ./ex52 -dim 3 -compute_function  -show_residual -batch
-refinement_limit 0.0 -section_view -gpu
PetscSection
Process 0:
  (   0) dim  0 offset   0
  (   1) dim  0 offset   0
  (   2) dim  0 offset   0
  (   3) dim  0 offset   0
  (   4) dim  0 offset   0
  (   5) dim  0 offset   0
  (   6) dim  1 offset   0
  (   7) dim  1 offset   1
  (   8) dim  1 offset   2
  (   9) dim  1 offset   3
  (  10) dim  1 offset   4
  (  11) dim  1 offset   5
  (  12) dim  1 offset   6
  (  13) dim  1 offset   7
GPU layout grid(1,1,1) block(4,1,1) with 1 batches
 N_t: 4, N_cb: 1
Residual:
Vector Object: 1 MPI processes
  type: seq
-0.666667
0.166667
0.166667
0.333333
-0.333333
-0.166667
-0.166667
0.666667


SCRGP2$ ./ex52 -dim 3 -compute_function  -show_residual -batch
-refinement_limit 0.01
Residual:
Vector Object: 1 MPI processes
  type: seq
-0.0833333
0.0833333
0.0507812
0.11849
-0.0833333
0.0833333
0.40743
0.428263
0.0598958
0.988459
1.10304
0.138428
-0.0833333
-0.125
-0.125
0.0833333
-0.10026
0.0507812
-0.0803223
0.0446777
-0.25
-0.117187
2.06367
1.51159
-0.181424
-0.418944
0.0195312
0.0282118
-0.574219
0.368924
-0.474826
0.0195312
-0.0238715
0.0195312
-0.0792359
-1.49715
1.10186
0.923154
-1.31844
-2.03658
0.0797244
0.0884049
-2.21192



SCRGP2$ ./ex52 -dim 3 -compute_function  -show_residual -batch
-refinement_limit 0.01 -gpu
GPU layout grid(3,7,1) block(4,1,1) with 1 batches
 N_t: 4, N_cb: 1
Residual:
Vector Object: 1 MPI processes
  type: seq
-0.0833333
0.0833333
0.0507812
0.11849
-0.0833333
0.0833333
0.0576986
0.0785319
0.0598958
0.0832038
0.197787
0.0932617
-0.0833333
-0.125
-0.125
0.0833333
-0.10026
0.0507812
-0.0803223
0.0446777
-0.25
-0.117187
0.345882
-0.206202
-0.181424
0.101074
0.0195312
0.0282118
0.0859375
0.148872
-0.0347222
0.0195312
-0.0238715
0.0195312
-0.588018
-0.117594
0.12749
0.13118
-0.121284
0.0692962
0.118305
0.126986
-0.106051




On Wed, Apr 25, 2012 at 1:27 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Apr 25, 2012 at 2:06 PM, Matthew Knepley <knepley at gmail.com>wrote:
>
>> On Wed, Apr 25, 2012 at 1:50 PM, David Fuentes <fuentesdt at gmail.com>wrote:
>>
>>> I'm pretty sure I pulled the latest on this petsc-dev and Generator, but
>>> when i change GPU type to PetscReal or double, i'm getting key errors ?
>>>
>>
>> Ah well, I was getting around to fixing this. The problem is a little
>> deeper.  The C version uses pointers everywhere to be
>> general, but I was using the CUDA types since I thought the compiler
>> would be more aggressive even though it is
>> limiting. However, despite having float1, float2, float3, and float4,
>> CUDA provides only double2. I do all my tests in float.
>> I will fix this after the baby goes down and mail back.
>>
>
> Okay, this is bizarre. It builds with double3 (not in the CUDA manual).
> You will need to pull both petsc and Generator, but
> it should work now. I have not really tested with double, so this will be
> interesting.
>
>    Matt
>
>
>>   Thanks,
>>
>>      Matt
>>
>> hg diff
>>> diff -r 4a642f9b7b70 config/PETSc/FEM.py
>>> --- a/config/PETSc/FEM.py       Wed Apr 25 02:13:29 2012 -0500
>>> +++ b/config/PETSc/FEM.py       Wed Apr 25 12:39:03 2012 -0500
>>> @@ -434,7 +434,7 @@
>>>      from Cxx import Define, Declarator
>>>      dim  = element.get_reference_element().get_spatial_dimension()
>>>      ext  = '_'+str(num)
>>> -    real = 'float'
>>> +    real = 'PetscReal'
>>>
>>>
>>> SCRGP2$ python
>>> $PETSC_DIR//bin/pythonscripts/PetscGenerateFEMQuadrature.py 3 1 1 1 laplace
>>> ex52.h
>>> [{(-1.0, -1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0, -1.0): [(1.0, ())]},
>>> {(-1.0, 1.0, -1.0): [(1.0, ())]}, {(-1.0, -1.0, 1.0): [(1.0, ())]}]
>>> {0: {0: [0], 1: [1], 2: [2], 3: [3]}, 1: {0: [], 1: [], 2: [], 3: [], 4:
>>> [], 5: []}, 2: {0: [], 1: [], 2: [], 3: []}, 3: {0: []}}
>>> Perm: [0, 1, 2, 3]
>>> Creating /home/fuentes/tutorials/ex52.h
>>> Traceback (most recent call last):
>>>   File
>>> "/opt/apps/PETSC/petsc-dev//bin/pythonscripts/PetscGenerateFEMQuadrature.py",
>>> line 31, in <module>
>>>     generator.run(elements, numBlocks, operator, filename)
>>>   File "/opt/apps/PETSC/petsc-dev/config/PETSc/FEM.py", line 977, in run
>>>     self.outputElementSource(self.getElementSource(elements, numBlocks,
>>> operator, sourceType = 'GPU'),
>>> os.path.splitext(filename)[0]+'_gpu'+os.path.splitext(filename)[1])
>>>   File "/opt/apps/PETSC/petsc-dev/config/PETSc/FEM.py", line 936, in
>>> getElementSource
>>>     defns.extend(self.getComputationTypes(element, n))
>>>   File "/opt/apps/PETSC/petsc-dev/config/PETSc/FEM.py", line 450, in
>>> getComputationTypes
>>>     vecType.type       = self.Cxx.typeMap[real+str(dim)]
>>> KeyError: 'PetscReal3'
>>>
>>>
>>> do I need to add something like below ?
>>>
>>>
>>> cd externalpackages/Generator/; hg diff
>>> diff -r 361a2eec9c01 CxxHelper.py
>>> --- a/CxxHelper.py      Tue Mar 27 09:01:27 2012 -0500
>>> +++ b/CxxHelper.py      Wed Apr 25 12:39:12 2012 -0500
>>> @@ -98,6 +98,11 @@
>>>      cxxType.identifier = 'float'
>>>      cxxType.baseType = True
>>>      cxxType.const = True
>>> +    typeMap['double3'] = cxxType
>>> +    cxxType = Type()
>>> +    cxxType.identifier = 'double'
>>> +    cxxType.baseType = True
>>> +    cxxType.const = True
>>>      typeMap['const float1'] = cxxType
>>>      cxxType = Type()
>>>      cxxType.identifier = 'float2'
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 2, 2012 at 9:37 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>
>>>> On Wed, Mar 28, 2012 at 6:42 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>
>>>>> On Wed, Mar 28, 2012 at 6:25 PM, David Fuentes <fuentesdt at gmail.com>wrote:
>>>>>
>>>>>> Yes. This would work.
>>>>>> I had trouble compiling in single precision using the some of the
>>>>>> external package options I was using for double.
>>>>>>
>>>>>
>>>>> Okay,  getting on it now.
>>>>>
>>>>
>>>> Okay, I have tested using PetscReal=double with floats on the GPU. If
>>>> you want doubles on the GPU, you can
>>>> currently change PETSc/FEM.py:434. I am deciding what the best way to
>>>> specify this type is.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>>    Matt
>>>>>
>>>>>
>>>>>> On Wed, Mar 28, 2012 at 4:57 PM, Matthew Knepley <knepley at gmail.com>wrote:
>>>>>>
>>>>>>> On Wed, Mar 28, 2012 at 4:12 PM, David Fuentes <fuentesdt at gmail.com>wrote:
>>>>>>>
>>>>>>>> works!
>>>>>>>>
>>>>>>>
>>>>>>> Excellent. Now, my thinking was that GPUs are most useful doing
>>>>>>> single work, but
>>>>>>> I can see the utility of double accuracy for a residual.
>>>>>>>
>>>>>>> My inclination is to define another type, say GPUReal, and use it
>>>>>>> for all kernels.
>>>>>>> Would that do what you want?
>>>>>>>
>>>>>>>    Matt
>>>>>>>
>>>>>>>
>>>>>>>> SCRGP2$ make ex52
>>>>>>>> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC
>>>>>>>> -I/opt/apps/PETSC/petsc-dev/include
>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include
>>>>>>>> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2
>>>>>>>> -D__INSDIR__=src/snes/examples/tutorials/ ex52.c
>>>>>>>> nvcc -G -O0 -g -arch=sm_10  -c --compiler-options="-O0 -g   -fPIC
>>>>>>>>  -I/opt/apps/PETSC/petsc-dev/include
>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include
>>>>>>>> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2
>>>>>>>>  -D__INSDIR__=src/snes/examples/tutorials/"  ex52_integrateElement.cu
>>>>>>>>  ex52_gpu_inline.h(7): warning: variable "points_0" was declared
>>>>>>>> but never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>>> never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>>>> never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared
>>>>>>>> but never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>>> never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>>>>>>>> declared but never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>>>> never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>>> never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>>>> never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared
>>>>>>>> but never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>>> never referenced
>>>>>>>>
>>>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>>>>>>>> declared but never referenced
>>>>>>>>
>>>>>>>> /usr/bin/mpicxx -O0 -g   -o ex52 ex52.o ex52_integrateElement.o
>>>>>>>>  -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
>>>>>>>> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
>>>>>>>>  -lpetsc
>>>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
>>>>>>>> -ltriangle -lX11 -lpthread -lmetis -Wl,-rpath,/opt/apps/cuda/4.0/cuda/lib64
>>>>>>>> -L/opt/apps/cuda/4.0/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse
>>>>>>>> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib
>>>>>>>> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5
>>>>>>>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3
>>>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt
>>>>>>>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
>>>>>>>> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl
>>>>>>>>
>>>>>>>>
>>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu
>>>>>>>> GPU layout grid(1,2,1) block(3,1,1) with 1 batches
>>>>>>>>  N_t: 3, N_cb: 1
>>>>>>>> Residual:
>>>>>>>> Vector Object: 1 MPI processes
>>>>>>>>   type: seq
>>>>>>>> -0.25
>>>>>>>> -0.5
>>>>>>>> 0.25
>>>>>>>> -0.5
>>>>>>>> -1
>>>>>>>> 0.5
>>>>>>>> 0.25
>>>>>>>> 0.5
>>>>>>>> 0.75
>>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>>>>> Residual:
>>>>>>>> Vector Object: 1 MPI processes
>>>>>>>>   type: seq
>>>>>>>> -0.25
>>>>>>>> -0.5
>>>>>>>> 0.25
>>>>>>>> -0.5
>>>>>>>> -1
>>>>>>>> 0.5
>>>>>>>> 0.25
>>>>>>>> 0.5
>>>>>>>> 0.75
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 28, 2012 at 1:37 PM, David Fuentes <fuentesdt at gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> sure. will do.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 28, 2012 at 1:23 PM, Matthew Knepley <
>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Mar 28, 2012 at 1:14 PM, David Fuentes <
>>>>>>>>>> fuentesdt at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> thanks! its running, but I seem to be getting different answer
>>>>>>>>>>> for cpu/gpu ?
>>>>>>>>>>> i had some floating point problems on this Tesla M2070 gpu
>>>>>>>>>>> before, but adding the '-arch=sm_20' option seemed to fix it last time.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> is the assembly in single precision ? my 'const PetscReal
>>>>>>>>>>> jacobianInverse' being passed in are doubles
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yep, that is the problem. I have not tested anything in double. I
>>>>>>>>>> have not decided exactly how to handle it. Can you
>>>>>>>>>> make another ARCH --with-precision=single and make sure it works,
>>>>>>>>>> and then we can fix the double issue?
>>>>>>>>>>
>>>>>>>>>>   Thanks,
>>>>>>>>>>
>>>>>>>>>>      Matt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>>>>>>>> -gpu
>>>>>>>>>>> GPU layout grid(1,2,1) block(3,1,1) with 1 batches
>>>>>>>>>>>  N_t: 3, N_cb: 1
>>>>>>>>>>> Residual:
>>>>>>>>>>> Vector Object: 1 MPI processes
>>>>>>>>>>>   type: seq
>>>>>>>>>>> 0
>>>>>>>>>>> 755712
>>>>>>>>>>> 0
>>>>>>>>>>> -58720
>>>>>>>>>>> -2953.13
>>>>>>>>>>> 0.375
>>>>>>>>>>> 1.50323e+07
>>>>>>>>>>> 0.875
>>>>>>>>>>> 0
>>>>>>>>>>> SCRGP2$
>>>>>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>>>>>>>> Residual:
>>>>>>>>>>> Vector Object: 1 MPI processes
>>>>>>>>>>>   type: seq
>>>>>>>>>>> -0.25
>>>>>>>>>>> -0.5
>>>>>>>>>>> 0.25
>>>>>>>>>>> -0.5
>>>>>>>>>>> -1
>>>>>>>>>>> 0.5
>>>>>>>>>>> 0.25
>>>>>>>>>>> 0.5
>>>>>>>>>>> 0.75
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 28, 2012 at 11:55 AM, Matthew Knepley <
>>>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 28, 2012 at 11:45 AM, David Fuentes <
>>>>>>>>>>>> fuentesdt at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The example seems to be running on cpu with '-batch' but
>>>>>>>>>>>>> i'm getting errors in line 323 with the '-gpu' option
>>>>>>>>>>>>>
>>>>>>>>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in
>>>>>>>>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>>>>>>>>
>>>>>>>>>>>>> should this possibly be PetscScalar ?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> No.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -  ierr = cudaMalloc((void**) &d_coefficients,         Ne*N_bt
>>>>>>>>>>>>> * sizeof(float));CHKERRQ(ierr);
>>>>>>>>>>>>> +  ierr = cudaMalloc((void**) &d_coefficients,         Ne*N_bt
>>>>>>>>>>>>> * sizeof(PetscScalar));CHKERRQ(ierr);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> SCRGP2$ python
>>>>>>>>>>>>> $PETSC_DIR/bin/pythonscripts/PetscGenerateFEMQuadrature.py 2 1 1 1
>>>>>>>>>>>>> laplacian ex52.h
>>>>>>>>>>>>> ['/opt/apps/PETSC/petsc-dev/bin/pythonscripts/PetscGenerateFEMQuadrature.py',
>>>>>>>>>>>>> '2', '1', '1', '1', 'laplacian', 'ex52.h']
>>>>>>>>>>>>> 2 1 1 1 laplacian
>>>>>>>>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]},
>>>>>>>>>>>>> {(-1.0, 1.0): [(1.0, ())]}]
>>>>>>>>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0:
>>>>>>>>>>>>> []}}
>>>>>>>>>>>>> Perm: [0, 1, 2]
>>>>>>>>>>>>> Creating /home/fuentes/snestutorial/ex52.h
>>>>>>>>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu.h
>>>>>>>>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]},
>>>>>>>>>>>>> {(-1.0, 1.0): [(1.0, ())]}]
>>>>>>>>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0:
>>>>>>>>>>>>> []}}
>>>>>>>>>>>>> Perm: [0, 1, 2]
>>>>>>>>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu_inline.h
>>>>>>>>>>>>>
>>>>>>>>>>>>> SCRGP2$ make ex52
>>>>>>>>>>>>> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC
>>>>>>>>>>>>> -I/opt/apps/PETSC/petsc-dev/include
>>>>>>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
>>>>>>>>>>>>> -I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve
>>>>>>>>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include
>>>>>>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
>>>>>>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
>>>>>>>>>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c
>>>>>>>>>>>>> nvcc -O0 -g -arch=sm_20  -c --compiler-options="-O0 -g   -fPIC
>>>>>>>>>>>>>    -I/opt/apps/PETSC/petsc-dev/include
>>>>>>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
>>>>>>>>>>>>> -I/opt/apps/cuda/4.1/cuda/include -I/opt/apps/PETSC/petsc-dev/include/sieve
>>>>>>>>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include
>>>>>>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
>>>>>>>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
>>>>>>>>>>>>> -I/usr/include/mpich2    -D__INSDIR__=src/snes/examples/tutorials/"
>>>>>>>>>>>>>  ex52_integrateElement.cu
>>>>>>>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0"
>>>>>>>>>>>>> was declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was
>>>>>>>>>>>>> declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0"
>>>>>>>>>>>>> was declared but never referenced
>>>>>>>>>>>>>
>>>>>>>>>>>>> /usr/bin/mpicxx -O0 -g   -o ex52 ex52.o
>>>>>>>>>>>>> ex52_integrateElement.o
>>>>>>>>>>>>>  -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
>>>>>>>>>>>>> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib  -lpetsc
>>>>>>>>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
>>>>>>>>>>>>> -ltriangle -lX11 -lpthread -lsuperlu_dist_3.0 -lcmumps -ldmumps -lsmumps
>>>>>>>>>>>>> -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs
>>>>>>>>>>>>> -Wl,-rpath,/opt/apps/cuda/4.1/cuda/lib64 -L/opt/apps/cuda/4.1/cuda/lib64
>>>>>>>>>>>>> -lcufft -lcublas -lcudart -lcusparse
>>>>>>>>>>>>> -Wl,-rpath,/opt/MATLAB/R2011a/sys/os/glnxa64:/opt/MATLAB/R2011a/bin/glnxa64:/opt/MATLAB/R2011a/extern/lib/glnxa64
>>>>>>>>>>>>> -L/opt/MATLAB/R2011a/bin/glnxa64 -L/opt/MATLAB/R2011a/extern/lib/glnxa64
>>>>>>>>>>>>> -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc
>>>>>>>>>>>>> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib
>>>>>>>>>>>>> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -lexoIIv2for -lexodus
>>>>>>>>>>>>> -lnetcdf_c++ -lnetcdf -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3
>>>>>>>>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt
>>>>>>>>>>>>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
>>>>>>>>>>>>> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl
>>>>>>>>>>>>> /bin/rm -f ex52.o ex52_integrateElement.o
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>>>>>>>>>> Residual:
>>>>>>>>>>>>> Vector Object: 1 MPI processes
>>>>>>>>>>>>>   type: seq
>>>>>>>>>>>>> -0.25
>>>>>>>>>>>>> -0.5
>>>>>>>>>>>>> 0.25
>>>>>>>>>>>>> -0.5
>>>>>>>>>>>>> -1
>>>>>>>>>>>>> 0.5
>>>>>>>>>>>>> 0.25
>>>>>>>>>>>>> 0.5
>>>>>>>>>>>>> 0.75
>>>>>>>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>>>>>>>>>> -gpu
>>>>>>>>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in
>>>>>>>>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>>>>>>>> [0]PETSC ERROR: FormFunctionLocalBatch() line 679 in
>>>>>>>>>>>>> src/snes/examples/tutorials/ex52.c
>>>>>>>>>>>>> [0]PETSC ERROR: SNESDMComplexComputeFunction() line 431 in
>>>>>>>>>>>>> src/snes/utils/damgsnes.c
>>>>>>>>>>>>> [0]PETSC ERROR: main() line 1021 in
>>>>>>>>>>>>> src/snes/examples/tutorials/ex52.c
>>>>>>>>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 35) - process 0
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is failing on cudaMalloc(), which means your card is not
>>>>>>>>>>>> available for running. Are you trying to run on your laptop?
>>>>>>>>>>>> If so, applications like Preview can lock up the GPU. I know of
>>>>>>>>>>>> no way to test this in CUDA while running. I just close
>>>>>>>>>>>> apps until it runs.
>>>>>>>>>>>>
>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>      Matt
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 27, 2012 at 8:37 PM, Matthew Knepley <
>>>>>>>>>>>>> knepley at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Mar 27, 2012 at 2:10 PM, Blaise Bourdin <
>>>>>>>>>>>>>> bourdin at lsu.edu> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mar 27, 2012, at 1:23 PM, Matthew Knepley wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Mar 27, 2012 at 12:58 PM, David Fuentes <
>>>>>>>>>>>>>>> fuentesdt at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I had a question about the status of example 52.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52.c
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can this example be used with a DM object created from an
>>>>>>>>>>>>>>>> unstructured exodusII mesh, DMMeshCreateExodus, And the FEM assembly done
>>>>>>>>>>>>>>>> on GPU ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) I have pushed many more tests for it now. They can be run
>>>>>>>>>>>>>>> using the Python build system
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   ./config/builder2.py check
>>>>>>>>>>>>>>> src/snes/examples/tutorials/ex52.c
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   in fact, you can build any set of files this way.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) The Exodus creation has to be converted to DMComplex from
>>>>>>>>>>>>>>> DMMesh. That should not take me very long. Blaise maintains that
>>>>>>>>>>>>>>>      so maybe there will be help :) You will just replace
>>>>>>>>>>>>>>> DMComplexCreateBoxMesh() with DMComplexCreateExodus(). If you request
>>>>>>>>>>>>>>>      it, I will bump it up the list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> DMMeshCreateExodusNG is much more flexible
>>>>>>>>>>>>>>> than DMMeshCreateExodus in that it can read meshes with multiple element
>>>>>>>>>>>>>>> types and should have a much lower memory footprint. The code should be
>>>>>>>>>>>>>>> fairly easy to read. you can email me directly if you have specific
>>>>>>>>>>>>>>> questions. I had looked at creating a DMComplex and it did not look too
>>>>>>>>>>>>>>> difficult, as long as interpolation is not needed. I have plans to
>>>>>>>>>>>>>>> write DMComplexCreateExodus, but haven't had time too so far. Updating the
>>>>>>>>>>>>>>> Vec viewers and readers may be a bit more involved. In perfect world, one
>>>>>>>>>>>>>>> would write an EXODUS viewer following the lines of the VTK and HDF5 ones.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David and Blaise,  I have converted this function, now
>>>>>>>>>>>>>> DMComplexCreateExodus(). Its not tested, but I think
>>>>>>>>>>>>>> Blaise has some stuff we can use to test it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      Matt
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Blaise
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if you can run the tests.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Thanks
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      Matt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> What most experimenters take for granted before they begin
>>>>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>>>>>> their experiments lead.
>>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>>> Department of Mathematics and Center for Computation &
>>>>>>>>>>>>>>> Technology
>>>>>>>>>>>>>>> Louisiana State University, Baton Rouge, LA 70803, USA
>>>>>>>>>>>>>>> Tel. +1 (225) 578 1612, Fax  +1 (225) 578 4276
>>>>>>>>>>>>>>> http://www.math.lsu.edu/~bourdin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> What most experimenters take for granted before they begin
>>>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>>>>> their experiments lead.
>>>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> What most experimenters take for granted before they begin
>>>>>>>>>>>> their experiments is infinitely more interesting than any results to which
>>>>>>>>>>>> their experiments lead.
>>>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>>>> experiments lead.
>>>>>>>>>> -- Norbert Wiener
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120425/33475587/attachment-0001.htm>