[petsc-users] [External] Re: MatVec on GPUs
Matthew Knepley
knepley at gmail.com
Tue Oct 19 20:34:28 CDT 2021
On Tue, Oct 19, 2021 at 9:18 PM Swarnava Ghosh <swarnava89 at gmail.com> wrote:
> Thank you Junchao! Is it possible to determine how much time is being
> spent on data transfer from the CPU mem to the GPU mem from the log?
>
It looks like
VecCUDACopyTo 891 1.1 1.5322e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 842 6.23e+01 0
0.00e+00 0
VecCUDACopyFrom 891 1.1 1.5837e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 842
6.23e+01 0
MatCUSPARSCopyTo 891 1.1 1.5229e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 842 1.93e+03 0
0.00e+00 0
Thanks,
Matt
>
> ************************************************************************************************************************
>
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
>
> ************************************************************************************************************************
>
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
>
> /ccsopen/home/swarnava/MiniApp_xl_cu/bin/sq on a named h49n15 with 4
> processors, by swarnava Tue Oct 19 21:10:56 2021
>
> Using Petsc Release Version 3.15.0, Mar 30, 2021
>
>
> Max Max/Min Avg Total
>
> Time (sec): 1.172e+02 1.000 1.172e+02
>
> Objects: 1.160e+02 1.000 1.160e+02
>
> Flop: 5.832e+10 1.125 5.508e+10 2.203e+11
>
> Flop/sec: 4.974e+08 1.125 4.698e+08 1.879e+09
>
> MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00
>
> MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00
>
> MPI Reductions: 1.320e+02 1.000
>
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>
> e.g., VecAXPY() for real vectors of length N
> --> 2N flop
>
> and VecAXPY() for complex vectors of length N
> --> 8N flop
>
>
> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages
> --- -- Message Lengths -- -- Reductions --
>
> Avg %Total Avg %Total Count %Total
> Avg %Total Count %Total
>
> 0: Main Stage: 1.1725e+02 100.0% 2.2033e+11 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 1.140e+02 86.4%
>
>
>
> ------------------------------------------------------------------------------------------------------------------------
>
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
>
> Phase summary info:
>
> Count: number of times phase was executed
>
> Time and Flop: Max - maximum over all processors
>
> Ratio - ratio of maximum to minimum over all processors
>
> Mess: number of messages sent
>
> AvgLen: average message length (bytes)
>
> Reduct: number of global reductions
>
> Global: entire computation
>
> Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
>
> %T - percent time in this phase %F - percent flop in this
> phase
>
> %M - percent messages in this phase %L - percent message
> lengths in this phase
>
> %R - percent reductions in this phase
>
> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
> over all processors)
>
> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
> time over all processors)
>
> CpuToGpu Count: total number of CPU to GPU copies per processor
>
> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
> processor)
>
> GpuToCpu Count: total number of GPU to CPU copies per processor
>
> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
> processor)
>
> GPU %F: percent flops on GPU in this event
>
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Event Count Time (sec) Flop
> --- Global --- --- Stage ---- Total GPU - CpuToGpu - -
> GpuToCpu - GPU
>
> Max Ratio Max Ratio Max Ratio Mess AvgLen
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count
> Size %F
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> --- Event Stage 0: Main Stage
>
>
> BuildTwoSided 2 1.0 6.2501e-03145.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> BuildTwoSidedF 2 1.0 6.2628e-03123.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> VecDot 89991 1.1 3.4663e+00 1.2 1.67e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00 3 3 0 0 0 3 3 0 0 0 1816 1841 0 0.00e+00
> 84992 6.80e-01 100
>
> VecNorm 89991 1.1 5.5282e+00 1.2 1.67e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00 4 3 0 0 0 4 3 0 0 0 1139 1148 0 0.00e+00
> 84992 6.80e-01 100
>
> VecScale 89991 1.1 1.3902e+00 1.2 8.33e+08 1.1 0.0e+00 0.0e+00
> 0.0e+00 1 1 0 0 0 1 1 0 0 0 2265 2343 84992 6.80e-01 0
> 0.00e+00 100
>
> VecCopy 178201 1.1 2.9825e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> VecSet 3589 1.1 1.0195e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> VecAXPY 179091 1.1 2.7456e+00 1.2 3.32e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00 2 6 0 0 0 2 6 0 0 0 4564 4739 169142 1.35e+00 0
> 0.00e+00 100
>
> VecCUDACopyTo 891 1.1 1.5322e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 842 6.23e+01 0
> 0.00e+00 0
>
> VecCUDACopyFrom 891 1.1 1.5837e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 842
> 6.23e+01 0
>
> DMCreateMat 5 1.0 7.3491e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 7.0e+00 1 0 0 0 5 1 0 0 0 6 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> SFSetGraph 5 1.0 3.5016e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> MatMult 89991 1.1 2.0423e+00 1.2 5.08e+10 1.1 0.0e+00 0.0e+00
> 0.0e+00 2 87 0 0 0 2 87 0 0 0 94039 105680 1683 2.00e+03 0
> 0.00e+00 100
>
> MatCopy 891 1.1 1.3600e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> MatConvert 2 1.0 1.0489e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> MatScale 2 1.0 2.7950e-04 1.3 3.18e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 4530 0 0 0.00e+00 0
> 0.00e+00 0
>
> MatAssemblyBegin 7 1.0 6.3768e-0368.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00 0 0 0 0 2 0 0 0 0 2 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> MatAssemblyEnd 7 1.0 7.9870e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00 0 0 0 0 3 0 0 0 0 4 0 0 0 0.00e+00 0
> 0.00e+00 0
>
> MatCUSPARSCopyTo 891 1.1 1.5229e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 842 1.93e+03 0
> 0.00e+00 0
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
>
> Reports information only for process 0.
>
>
> --- Event Stage 0: Main Stage
>
>
> Vector 69 11 19112 0.
>
> Distributed Mesh 3 0 0 0.
>
> Index Set 12 10 187512 0.
>
> IS L to G Mapping 3 0 0 0.
>
> Star Forest Graph 11 0 0 0.
>
> Discrete System 3 0 0 0.
>
> Weak Form 3 0 0 0.
>
> Application Order 1 0 0 0.
>
> Matrix 8 0 0 0.
>
> Krylov Solver 1 0 0 0.
>
> Preconditioner 1 0 0 0.
>
> Viewer 1 0 0 0.
>
>
> ========================================================================================================================
>
> Average time to get PetscTime(): 4.32e-08
>
> Average time for MPI_Barrier(): 9.94e-07
>
> Average time for zero size MPI_Send(): 4.20135e-05
>
>
> Sincerely,
>
> SG
>
> On Tue, Oct 19, 2021 at 12:28 AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>>
>>
>>
>> On Mon, Oct 18, 2021 at 10:56 PM Swarnava Ghosh <swarnava89 at gmail.com>
>> wrote:
>>
>>> I am trying the port parts of the following function on GPUs.
>>> Essentially, the lines of codes between the two "TODO..." comments should
>>> be executed on the device. Here is the function:
>>>
>>> PetscScalar CalculateSpectralNodesAndWeights(LSDFT_OBJ *pLsdft, int p,
>>> int LIp)
>>> {
>>>
>>> PetscInt N_qp;
>>> N_qp = pLsdft->N_qp;
>>>
>>> int k;
>>> PetscScalar *a, *b;
>>> k=0;
>>>
>>> PetscMalloc(sizeof(PetscScalar)*(N_qp+1), &a);
>>> PetscMalloc(sizeof(PetscScalar)*(N_qp+1), &b);
>>>
>>> /*
>>> * TODO: COPY a, b, pLsdft->Vk, pLsdft->Vkm1, pLsdft->Vkp1,
>>> pLsdft->LapPlusVeffOprloc, k,p,N_qp from HOST to DEVICE
>>> * DO THE FOLLOWING OPERATIONS ON DEVICE
>>> */
>>>
>>> //zero out vectors
>>> VecZeroEntries(pLsdft->Vk);
>>> VecZeroEntries(pLsdft->Vkm1);
>>> VecZeroEntries(pLsdft->Vkp1);
>>>
>>> VecSetValue(pLsdft->Vkm1, p, 1.0, INSERT_VALUES);
>>> MatMult(pLsdft->LapPlusVeffOprloc,pLsdft->Vkm1,pLsdft->Vk);
>>> VecDot(pLsdft->Vkm1, pLsdft->Vk, &a[0]);
>>> VecAXPY(pLsdft->Vk, -a[0], pLsdft->Vkm1);
>>> VecNorm(pLsdft->Vk, NORM_2, &b[0]);
>>> VecScale(pLsdft->Vk, 1.0 / b[0]);
>>>
>>> for (k = 0; k < N_qp; k++) {
>>> MatMult(pLsdft->LapPlusVeffOprloc,pLsdft->Vk,pLsdft->Vkp1);
>>> VecDot(pLsdft->Vk, pLsdft->Vkp1, &a[k + 1]);
>>> VecAXPY(pLsdft->Vkp1, -a[k + 1], pLsdft->Vk);
>>> VecAXPY(pLsdft->Vkp1, -b[k], pLsdft->Vkm1);
>>> VecCopy(pLsdft->Vk, pLsdft->Vkm1);
>>> VecNorm(pLsdft->Vkp1, NORM_2, &b[k + 1]);
>>> VecCopy(pLsdft->Vkp1, pLsdft->Vk);
>>> VecScale(pLsdft->Vk, 1.0 / b[k + 1]);
>>> }
>>>
>>> /*
>>> * TODO: Copy back a, b, pLsdft->Vk, pLsdft->Vkm1, pLsdft->Vkp1,
>>> pLsdft->LapPlusVeffOprloc, k,p,N_qp from DEVICE to HOST
>>> */
>>>
>>> /*
>>> * Some operation with a, and b on HOST
>>> *
>>> */
>>> TridiagEigenVecSolve_NodesAndWeights(pLsdft, a, b, N_qp, LIp); //
>>> operation on the host
>>>
>>> // free a,b
>>> PetscFree(a);
>>> PetscFree(b);
>>>
>>> return 0;
>>> }
>>>
>>> If I just use the command line options to set vectors Vk,Vkp1 and Vkm1
>>> as cuda vectors and the matrix LapPlusVeffOprloc as aijcusparse, will the
>>> lines of code between the two "TODO" comments be entirely executed on the
>>> device?
>>>
>> yes, except VecSetValue(pLsdft->Vkm1, p, 1.0, INSERT_VALUES); which is
>> done on CPU, by pulling down vector data from GPU to CPU and setting the
>> value. Subsequent vector operations will push the updated vector data to
>> GPU again.
>>
>>
>>>
>>> Sincerely,
>>> Swarnava
>>>
>>>
>>> On Mon, Oct 18, 2021 at 10:13 PM Swarnava Ghosh <swarnava89 at gmail.com>
>>> wrote:
>>>
>>>> Thanks for the clarification, Junchao.
>>>>
>>>> Sincerely,
>>>> Swarnava
>>>>
>>>> On Mon, Oct 18, 2021 at 10:08 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 18, 2021 at 8:47 PM Swarnava Ghosh <swarnava89 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Junchao,
>>>>>>
>>>>>> If I want to pass command line options as -mymat_mat_type
>>>>>> aijcusparse, should it be MatSetOptionsPrefix(A,"mymat"); or
>>>>>> MatSetOptionsPrefix(A,"mymat_"); ? Could you please clarify?
>>>>>>
>>>>> my fault, it should be MatSetOptionsPrefix(A,"mymat_"), as seen in
>>>>> mat/tests/ex62.c
>>>>> Thanks
>>>>>
>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>> Swarnava
>>>>>>
>>>>>> On Mon, Oct 18, 2021 at 9:23 PM Junchao Zhang <
>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>
>>>>>>> MatSetOptionsPrefix(A,"mymat")
>>>>>>> VecSetOptionsPrefix(v,"myvec")
>>>>>>>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 18, 2021 at 8:04 PM Chang Liu <cliu at pppl.gov> wrote:
>>>>>>>
>>>>>>>> Hi Junchao,
>>>>>>>>
>>>>>>>> Thank you for your answer. I tried MatConvert and it works. I
>>>>>>>> didn't
>>>>>>>> make it before because I forgot to convert a vector from mpi to
>>>>>>>> mpicuda
>>>>>>>> previously.
>>>>>>>>
>>>>>>>> For vector, there is no VecConvert to use, so I have to do
>>>>>>>> VecDuplicate,
>>>>>>>> VecSetType and VecCopy. Is there an easier option?
>>>>>>>>
>>>>>>> As Matt suggested, you could single out the matrix and vector with
>>>>>>> options prefix and set their type on command line
>>>>>>>
>>>>>>> MatSetOptionsPrefix(A,"mymat");
>>>>>>> VecSetOptionsPrefix(v,"myvec");
>>>>>>>
>>>>>>> Then, -mymat_mat_type aijcusparse -myvec_vec_type cuda
>>>>>>>
>>>>>>> A simpler code is to have the vector type automatically set by
>>>>>>> MatCreateVecs(A,&v,NULL)
>>>>>>>
>>>>>>>
>>>>>>>> Chang
>>>>>>>>
>>>>>>>> On 10/18/21 5:23 PM, Junchao Zhang wrote:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, Oct 18, 2021 at 3:42 PM Chang Liu via petsc-users
>>>>>>>> > <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>>>>>> >
>>>>>>>> > Hi Matt,
>>>>>>>> >
>>>>>>>> > I have a related question. In my code I have many matrices
>>>>>>>> and I only
>>>>>>>> > want to have one living on GPU, the others still staying on
>>>>>>>> CPU mem.
>>>>>>>> >
>>>>>>>> > I wonder if there is an easier way to copy a mpiaij matrix to
>>>>>>>> > mpiaijcusparse (in other words, copy data to GPUs). I can
>>>>>>>> think of
>>>>>>>> > creating a new mpiaijcusparse matrix, and copying the data
>>>>>>>> line by
>>>>>>>> > line.
>>>>>>>> > But I wonder if there is a better option.
>>>>>>>> >
>>>>>>>> > I have tried MatCopy and MatConvert but neither work.
>>>>>>>> >
>>>>>>>> > Did you use MatConvert(mat,matype,MAT_INPLACE_MATRIX,&mat)?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Chang
>>>>>>>> >
>>>>>>>> > On 10/17/21 7:50 PM, Matthew Knepley wrote:
>>>>>>>> > > On Sun, Oct 17, 2021 at 7:12 PM Swarnava Ghosh
>>>>>>>> > <swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>
>>>>>>>> > > <mailto:swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>>>
>>>>>>>> wrote:
>>>>>>>> > >
>>>>>>>> > > Do I need convert the MATSEQBAIJ to a cuda matrix in
>>>>>>>> code?
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > You would need a call to MatSetFromOptions() to take that
>>>>>>>> type
>>>>>>>> > from the
>>>>>>>> > > command line, and not have
>>>>>>>> > > the type hard-coded in your application. It is generally a
>>>>>>>> bad
>>>>>>>> > idea to
>>>>>>>> > > hard code the implementation type.
>>>>>>>> > >
>>>>>>>> > > If I do it from command line, then are the other
>>>>>>>> MatVec calls are
>>>>>>>> > > ported onto CUDA? I have many MatVec calls in my code,
>>>>>>>> but I
>>>>>>>> > > specifically want to port just one call.
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > You can give that one matrix an options prefix to isolate
>>>>>>>> it.
>>>>>>>> > >
>>>>>>>> > > Thanks,
>>>>>>>> > >
>>>>>>>> > > Matt
>>>>>>>> > >
>>>>>>>> > > Sincerely,
>>>>>>>> > > Swarnava
>>>>>>>> > >
>>>>>>>> > > On Sun, Oct 17, 2021 at 7:07 PM Junchao Zhang
>>>>>>>> > > <junchao.zhang at gmail.com <mailto:
>>>>>>>> junchao.zhang at gmail.com>
>>>>>>>> > <mailto:junchao.zhang at gmail.com <mailto:
>>>>>>>> junchao.zhang at gmail.com>>>
>>>>>>>> > wrote:
>>>>>>>> > >
>>>>>>>> > > You can do that with command line options -mat_type
>>>>>>>> > aijcusparse
>>>>>>>> > > -vec_type cuda
>>>>>>>> > >
>>>>>>>> > > On Sun, Oct 17, 2021, 5:32 PM Swarnava Ghosh
>>>>>>>> > > <swarnava89 at gmail.com <mailto:swarnava89 at gmail.com
>>>>>>>> >
>>>>>>>> > <mailto:swarnava89 at gmail.com <mailto:swarnava89 at gmail.com>>>
>>>>>>>> wrote:
>>>>>>>> > >
>>>>>>>> > > Dear Petsc team,
>>>>>>>> > >
>>>>>>>> > > I had a query regarding using CUDA to
>>>>>>>> accelerate a matrix
>>>>>>>> > > vector product.
>>>>>>>> > > I have a sequential sparse matrix
>>>>>>>> (MATSEQBAIJ type).
>>>>>>>> > I want
>>>>>>>> > > to port a MatVec call onto GPUs. Is there any
>>>>>>>> > code/example I
>>>>>>>> > > can look at?
>>>>>>>> > >
>>>>>>>> > > Sincerely,
>>>>>>>> > > SG
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > What most experimenters take for granted before they begin
>>>>>>>> their
>>>>>>>> > > experiments is infinitely more interesting than any
>>>>>>>> results to which
>>>>>>>> > > their experiments lead.
>>>>>>>> > > -- Norbert Wiener
>>>>>>>> > >
>>>>>>>> > > https://www.cse.buffalo.edu/~knepley/
>>>>>>>> > <https://www.cse.buffalo.edu/~knepley/>
>>>>>>>> > <http://www.cse.buffalo.edu/~knepley/
>>>>>>>> > <http://www.cse.buffalo.edu/~knepley/>>
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Chang Liu
>>>>>>>> > Staff Research Physicist
>>>>>>>> > +1 609 243 3438
>>>>>>>> > cliu at pppl.gov <mailto:cliu at pppl.gov>
>>>>>>>> > Princeton Plasma Physics Laboratory
>>>>>>>> > 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chang Liu
>>>>>>>> Staff Research Physicist
>>>>>>>> +1 609 243 3438
>>>>>>>> cliu at pppl.gov
>>>>>>>> Princeton Plasma Physics Laboratory
>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>>>>>>
>>>>>>>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211019/73ac9135/attachment-0001.html>
More information about the petsc-users
mailing list