[petsc-users] Using PETSc GPU backend

Sat Apr 13 14:53:40 CDT 2024

As Matt said, high- frequency Helmholtz is very hard, low-frequency is
doable by using a larger coarse grid (you have a tiny coarse grid).

On Sat, Apr 13, 2024 at 3:04 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Fri, Apr 12, 2024 at 8:19 PM Ng, Cho-Kuen via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Mark. The FEM problem is of high-frequency Helmholtz type for the Maxwell
>> equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this
>> small problem and the results agreed well with those obtained from the
>> direct solver
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>> Mark.
>>
>> The FEM problem is of high-frequency Helmholtz type for the Maxwell
>> equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this
>> small problem and the results agreed well with those obtained from the
>> direct solver MUMPS. However, this ksp-pc combination failed to converge
>> for much larger problem sizes. What would be a good combination to solve
>> this kind of large-scale problems where direct solvers would not be able to
>> solve?
>>
>
> Solving high frequency Helmholtz is a hard research problem. There are no
> easy answers. I know of no
> real solutions for C0 FEM. it seems to be a bad discretization for this
> from the start. I have not seen anything reliable that is not mostly
> direct, but this is not what I do for a living so something might exist,
> although not in any publicly available software.
>
>   Thanks,
>
>      Matt
>
>
>> Thanks,
>> Cho
>> ------------------------------
>> *From:* Mark Adams <mfadams at lbl.gov>
>> *Sent:* Friday, April 12, 2024 4:52 PM
>> *To:* Barry Smith <bsmith at petsc.dev>
>> *Cc:* Ng, Cho-Kuen <cho at slac.stanford.edu>; petsc-users <
>> petsc-users at mcs.anl.gov>; Jacob Faibussowitsch <jacob.fai at gmail.com>
>> *Subject:* Re: [petsc-users] Using PETSc GPU backend
>>
>> As Barry said, this is a bit small but the performance looks reasonable.
>> The solver does very badly, mathematically.
>> I would try hypre to get another data point.
>> You could also try 'cg' to check that the pipelined version is not a
>> problem.
>> Mark
>>
>> On Fri, Apr 12, 2024 at 3:54 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>> 800k is a pretty small problem for GPUs. We would need to see the runs
>> with output from -ksp_view -log_view to see if the timing results are
>> reasonable. On Apr 12, 2024, at 1: 48 PM, Ng, Cho-Kuen <cho@ slac.
>> stanford. edu> wrote: I performed
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>>
>>   800k is a pretty small problem for GPUs.
>>
>>   We would need to see the runs with output from -ksp_view -log_view to
>> see if the timing results are reasonable.
>>
>> On Apr 12, 2024, at 1:48 PM, Ng, Cho-Kuen <cho at slac.stanford.edu> wrote:
>>
>> I performed tests on comparison using KSP with and without cuda backend
>> on NERSC's Perlmutter. For a finite element solve with 800k degrees of
>> freedom, the best times obtained using MPI and MPI+GPU were
>>
>> o MPI - 128 MPI tasks, 27 s
>>
>> o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s
>>
>> Is that the performance one would expect using the hybrid mode of
>> computation. Attached image shows the scaling on a single node.
>>
>> Thanks,
>> Cho
>> ------------------------------
>> *From:* Ng, Cho-Kuen <cho at slac.stanford.edu>
>> *Sent:* Saturday, August 12, 2023 8:08 AM
>> *To:* Jacob Faibussowitsch <jacob.fai at gmail.com>
>> *Cc:* Barry Smith <bsmith at petsc.dev>; petsc-users <
>> petsc-users at mcs.anl.gov>
>> *Subject:* Re: [petsc-users] Using PETSc GPU backend
>>
>> Thanks Jacob.
>> ------------------------------
>> *From:* Jacob Faibussowitsch <jacob.fai at gmail.com>
>> *Sent:* Saturday, August 12, 2023 5:02 AM
>> *To:* Ng, Cho-Kuen <cho at slac.stanford.edu>
>> *Cc:* Barry Smith <bsmith at petsc.dev>; petsc-users <
>> petsc-users at mcs.anl.gov>
>> *Subject:* Re: [petsc-users] Using PETSc GPU backend
>>
>> > Can petsc show the number of GPUs used?
>>
>> -device_view
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>>
>> > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >
>> > Barry,
>> >
>> > I tried again today on Perlmutter and running on multiple GPU nodes
>> worked. Likely, I had messed up something the other day. Also, I was able
>> to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output
>> shows the number of MPI tasks:
>> >
>> > KSP Object: 32 MPI processes
>> >
>> > Can petsc show the number of GPUs used?
>> >
>> > Thanks,
>> > Cho
>> >
>> > From: Barry Smith <bsmith at petsc.dev>
>> > Sent: Wednesday, August 9, 2023 4:09 PM
>> > To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> > Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> > Subject: Re: [petsc-users] Using PETSc GPU backend
>> >
>> >   We would need more information about "hanging". Do PETSc examples and
>> tiny problems "hang" on multiple nodes? If you run with -info what are the
>> last messages printed? Can you run with a debugger to see where it is
>> "hanging"?
>> >
>> >
>> >
>> >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen <cho at slac.stanford.edu>
>> wrote:
>> >>
>> >> Barry and Matt,
>> >>
>> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1
>> node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple
>> nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I
>> fix this?
>> >>
>> >> Best,
>> >> Cho
>> >>
>> >>  From: Barry Smith <bsmith at petsc.dev>
>> >> Sent: Monday, July 17, 2023 6:58 AM
>> >> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>
>> >>  The examples that use DM, in particular DMDA all trivially support
>> using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda
>> >>
>> >>
>> >>
>> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen <cho at slac.stanford.edu>
>> wrote:
>> >>>
>> >>> Barry,
>> >>>
>> >>> Thank you so much for the clarification.
>> >>>
>> >>> I see that ex104.c and ex300.c use  MatXAIJSetPreallocation(). Are
>> there other tutorials available?
>> >>>
>> >>> Cho
>> >>>  From: Barry Smith <bsmith at petsc.dev>
>> >>> Sent: Saturday, July 15, 2023 8:36 AM
>> >>> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>
>> >>>      Cho,
>> >>>
>> >>>     We currently have a crappy API for turning on GPU support, and
>> our documentation is misleading in places.
>> >>>
>> >>>     People constantly say "to use GPU's with PETSc you only need to
>> use -mat_type aijcusparse (for example)" This is incorrect.
>> >>>
>> >>>  This does not work with code that uses the convenience Mat
>> constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only
>> works if you use the constructor approach of MatCreate(), MatSetSizes(),
>> MatSetFromOptions(), MatXXXSetPreallocation(). ...  Similarly you need to
>> use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda
>> >>>
>> >>>    If you use DM to create the matrices and vectors then you can use
>> -dm_mat_type aijcusparse -dm_vec_type cuda
>> >>>
>> >>>    Sorry for the confusion.
>> >>>
>> >>>    Barry
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> >>>>
>> >>>> On Sat, Jul 15, 2023 at 1:44 AM Ng, Cho-Kuen <cho at slac.stanford.edu>
>> wrote:
>> >>>> Matt,
>> >>>>
>> >>>> After inserting 2 lines in the code:
>> >>>>
>> >>>>   ierr =
>> MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
>> >>>>   ierr = MatSetFromOptions(A);CHKERRQ(ierr);
>> >>>>   ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>> >>>>
>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>> >>>>
>> >>>> "There are no unused options." However, there is no improvement on
>> the GPU performance.
>> >>>>
>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat
>> you created in steps 1 and 2. This is detailed in the manual.
>> >>>>
>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before
>> MatSetFromOptions().
>> >>>>
>> >>>>   THanks,
>> >>>>
>> >>>>     Matt
>> >>>>  Thanks,
>> >>>> Cho
>> >>>>
>> >>>>  From: Matthew Knepley <knepley at gmail.com>
>> >>>> Sent: Friday, July 14, 2023 5:57 PM
>> >>>> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>> Cc: Barry Smith <bsmith at petsc.dev>; Mark Adams <mfadams at lbl.gov>;
>> petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>  On Fri, Jul 14, 2023 at 7:57 PM Ng, Cho-Kuen <cho at slac.stanford.edu>
>> wrote:
>> >>>> I managed to pass the following options to PETSc using a GPU node on
>> Perlmutter.
>> >>>>
>> >>>>     -mat_type aijcusparse -vec_type cuda -log_view -options_left
>> >>>>
>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task.
>> >>>>
>> >>>> o #PETSc Option Table entries:
>> >>>> -log_view
>> >>>> -mat_type aijcusparse
>> >>>>    -options_left
>> >>>>    -vec_type cuda
>> >>>>    #End of PETSc Option Table entries
>> >>>>    WARNING! There are options you set that were not used!
>> >>>>    WARNING! could be spelling mistake, etc!
>> >>>>    There is one unused database option. It is:
>> >>>>    Option left: name:-mat_type value: aijcusparse
>> >>>>
>> >>>> The -mat_type option has not been used. In the application code, we
>> use
>> >>>>
>> >>>>     ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>> >>>>              d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>> >>>>
>> >>>>
>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in
>> order to set the type from the command line.
>> >>>>
>> >>>>   Thanks,
>> >>>>
>> >>>>      Matt
>> >>>>  o The percent flops on the GPU for KSPSolve is 17%.
>> >>>>
>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an
>> order of magnitude slower. How can I improve the GPU performance?
>> >>>>
>> >>>> Thanks,
>> >>>> Cho
>> >>>>  From: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>> Sent: Friday, June 30, 2023 7:57 AM
>> >>>> To: Barry Smith <bsmith at petsc.dev>; Mark Adams <mfadams at lbl.gov>
>> >>>> Cc: Matthew Knepley <knepley at gmail.com>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>
>> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>  Barry, Mark and Matt,
>> >>>>
>> >>>> Thank you all for the suggestions. I will modify the code so we can
>> pass runtime options.
>> >>>>
>> >>>> Cho
>> >>>>  From: Barry Smith <bsmith at petsc.dev>
>> >>>> Sent: Friday, June 30, 2023 7:01 AM
>> >>>> To: Mark Adams <mfadams at lbl.gov>
>> >>>> Cc: Matthew Knepley <knepley at gmail.com>; Ng, Cho-Kuen <
>> cho at slac.stanford.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>
>> >>>>   Note that options like -mat_type aijcusparse  -vec_type cuda only
>> work if the program is set up to allow runtime swapping of matrix and
>> vector types. If you have a call to MatCreateMPIAIJ() or other specific
>> types then then these options do nothing but because Mark had you use
>> -options_left the program will tell you at the end that it did not use the
>> option so you will know.
>> >>>>
>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> >>>>>
>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the
>> args and you run:
>> >>>>>
>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left
>> >>>>>
>> >>>>> Mark
>> >>>>>
>> >>>>> On Fri, Jun 30, 2023 at 6:16 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>> >>>>> On Fri, Jun 30, 2023 at 1:13 AM Ng, Cho-Kuen via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >>>>> Mark,
>> >>>>>
>> >>>>> The application code reads in parameters from an input file, where
>> we can put the PETSc runtime options. Then we pass the options to
>> PetscInitialize(...). Does that sounds right?
>> >>>>>
>> >>>>> PETSc will read command line argument automatically in
>> PetscInitialize() unless you shut it off.
>> >>>>>
>> >>>>>   Thanks,
>> >>>>>
>> >>>>>     Matt
>> >>>>>  Cho
>> >>>>>  From: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>>> Sent: Thursday, June 29, 2023 8:32 PM
>> >>>>> To: Mark Adams <mfadams at lbl.gov>
>> >>>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>>  Mark,
>> >>>>>
>> >>>>> Thanks for the information. How do I put the runtime options for
>> the executable, say, a.out, which does not have the provision to append
>> arguments? Do I need to change the C++ main to read in the options?
>> >>>>>
>> >>>>> Cho
>> >>>>>  From: Mark Adams <mfadams at lbl.gov>
>> >>>>> Sent: Thursday, June 29, 2023 5:55 PM
>> >>>>> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>>  Run with options: -mat_type aijcusparse -vec_type cuda -log_view
>> -options_left
>> >>>>> The last column of the performance data (from -log_view) will be
>> the percent flops on the GPU. Check that that is > 0.
>> >>>>>
>> >>>>> The end of the output will list the options that were used and
>> options that were _not_ used (if any). Check that there are no options left.
>> >>>>>
>> >>>>> Mark
>> >>>>>
>> >>>>> On Thu, Jun 29, 2023 at 7:50 PM Ng, Cho-Kuen via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >>>>> I installed PETSc on Perlmutter using "spack install
>> petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I
>> compiled the application code (purely CPU code) linking to the petsc
>> package, hoping that I can get performance improvement using the petsc GPU
>> backend. However, the timing was the same using the same number of MPI
>> tasks with and without GPU accelerators. Have I missed something in the
>> process, for example, setting up PETSc options at runtime to use the GPU
>> backend?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Cho
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> >>>>> -- Norbert Wiener
>> >>>>>
>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ 
>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> >>>> -- Norbert Wiener
>> >>>>
>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ 
>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> >>>> -- Norbert Wiener
>> >>>>
>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ 
>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>> >>
>> >>
>> >>
>> >>  From: Barry Smith <bsmith at petsc.dev>
>> >> Sent: Monday, July 17, 2023 6:58 AM
>> >> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>
>> >>  The examples that use DM, in particular DMDA all trivially support
>> using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda
>> >>
>> >>
>> >>
>> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen <cho at slac.stanford.edu>
>> wrote:
>> >>>
>> >>> Barry,
>> >>>
>> >>> Thank you so much for the clarification.
>> >>>
>> >>> I see that ex104.c and ex300.c use  MatXAIJSetPreallocation(). Are
>> there other tutorials available?
>> >>>
>> >>> Cho
>> >>>  From: Barry Smith <bsmith at petsc.dev>
>> >>> Sent: Saturday, July 15, 2023 8:36 AM
>> >>> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>
>> >>>      Cho,
>> >>>
>> >>>     We currently have a crappy API for turning on GPU support, and
>> our documentation is misleading in places.
>> >>>
>> >>>     People constantly say "to use GPU's with PETSc you only need to
>> use -mat_type aijcusparse (for example)" This is incorrect.
>> >>>
>> >>>  This does not work with code that uses the convenience Mat
>> constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only
>> works if you use the constructor approach of MatCreate(), MatSetSizes(),
>> MatSetFromOptions(), MatXXXSetPreallocation(). ...  Similarly you need to
>> use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda
>> >>>
>> >>>    If you use DM to create the matrices and vectors then you can use
>> -dm_mat_type aijcusparse -dm_vec_type cuda
>> >>>
>> >>>    Sorry for the confusion.
>> >>>
>> >>>    Barry
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> >>>>
>> >>>> On Sat, Jul 15, 2023 at 1:44 AM Ng, Cho-Kuen <cho at slac.stanford.edu>
>> wrote:
>> >>>> Matt,
>> >>>>
>> >>>> After inserting 2 lines in the code:
>> >>>>
>> >>>>   ierr =
>> MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
>> >>>>   ierr = MatSetFromOptions(A);CHKERRQ(ierr);
>> >>>>   ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>> >>>>
>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>> >>>>
>> >>>> "There are no unused options." However, there is no improvement on
>> the GPU performance.
>> >>>>
>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat
>> you created in steps 1 and 2. This is detailed in the manual.
>> >>>>
>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before
>> MatSetFromOptions().
>> >>>>
>> >>>>   THanks,
>> >>>>
>> >>>>     Matt
>> >>>>  Thanks,
>> >>>> Cho
>> >>>>
>> >>>>  From: Matthew Knepley <knepley at gmail.com>
>> >>>> Sent: Friday, July 14, 2023 5:57 PM
>> >>>> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>> Cc: Barry Smith <bsmith at petsc.dev>; Mark Adams <mfadams at lbl.gov>;
>> petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>  On Fri, Jul 14, 2023 at 7:57 PM Ng, Cho-Kuen <cho at slac.stanford.edu>
>> wrote:
>> >>>> I managed to pass the following options to PETSc using a GPU node on
>> Perlmutter.
>> >>>>
>> >>>>     -mat_type aijcusparse -vec_type cuda -log_view -options_left
>> >>>>
>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task.
>> >>>>
>> >>>> o #PETSc Option Table entries:
>> >>>> -log_view
>> >>>> -mat_type aijcusparse
>> >>>>    -options_left
>> >>>>    -vec_type cuda
>> >>>>    #End of PETSc Option Table entries
>> >>>>    WARNING! There are options you set that were not used!
>> >>>>    WARNING! could be spelling mistake, etc!
>> >>>>    There is one unused database option. It is:
>> >>>>    Option left: name:-mat_type value: aijcusparse
>> >>>>
>> >>>> The -mat_type option has not been used. In the application code, we
>> use
>> >>>>
>> >>>>     ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>> >>>>              d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>> >>>>
>> >>>>
>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in
>> order to set the type from the command line.
>> >>>>
>> >>>>   Thanks,
>> >>>>
>> >>>>      Matt
>> >>>>  o The percent flops on the GPU for KSPSolve is 17%.
>> >>>>
>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an
>> order of magnitude slower. How can I improve the GPU performance?
>> >>>>
>> >>>> Thanks,
>> >>>> Cho
>> >>>>  From: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>> Sent: Friday, June 30, 2023 7:57 AM
>> >>>> To: Barry Smith <bsmith at petsc.dev>; Mark Adams <mfadams at lbl.gov>
>> >>>> Cc: Matthew Knepley <knepley at gmail.com>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>
>> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>  Barry, Mark and Matt,
>> >>>>
>> >>>> Thank you all for the suggestions. I will modify the code so we can
>> pass runtime options.
>> >>>>
>> >>>> Cho
>> >>>>  From: Barry Smith <bsmith at petsc.dev>
>> >>>> Sent: Friday, June 30, 2023 7:01 AM
>> >>>> To: Mark Adams <mfadams at lbl.gov>
>> >>>> Cc: Matthew Knepley <knepley at gmail.com>; Ng, Cho-Kuen <
>> cho at slac.stanford.edu>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>
>> >>>>   Note that options like -mat_type aijcusparse  -vec_type cuda only
>> work if the program is set up to allow runtime swapping of matrix and
>> vector types. If you have a call to MatCreateMPIAIJ() or other specific
>> types then then these options do nothing but because Mark had you use
>> -options_left the program will tell you at the end that it did not use the
>> option so you will know.
>> >>>>
>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> >>>>>
>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the
>> args and you run:
>> >>>>>
>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left
>> >>>>>
>> >>>>> Mark
>> >>>>>
>> >>>>> On Fri, Jun 30, 2023 at 6:16 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>> >>>>> On Fri, Jun 30, 2023 at 1:13 AM Ng, Cho-Kuen via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >>>>> Mark,
>> >>>>>
>> >>>>> The application code reads in parameters from an input file, where
>> we can put the PETSc runtime options. Then we pass the options to
>> PetscInitialize(...). Does that sounds right?
>> >>>>>
>> >>>>> PETSc will read command line argument automatically in
>> PetscInitialize() unless you shut it off.
>> >>>>>
>> >>>>>   Thanks,
>> >>>>>
>> >>>>>     Matt
>> >>>>>  Cho
>> >>>>>  From: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>>> Sent: Thursday, June 29, 2023 8:32 PM
>> >>>>> To: Mark Adams <mfadams at lbl.gov>
>> >>>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>>  Mark,
>> >>>>>
>> >>>>> Thanks for the information. How do I put the runtime options for
>> the executable, say, a.out, which does not have the provision to append
>> arguments? Do I need to change the C++ main to read in the options?
>> >>>>>
>> >>>>> Cho
>> >>>>>  From: Mark Adams <mfadams at lbl.gov>
>> >>>>> Sent: Thursday, June 29, 2023 5:55 PM
>> >>>>> To: Ng, Cho-Kuen <cho at slac.stanford.edu>
>> >>>>> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>> >>>>>  Run with options: -mat_type aijcusparse -vec_type cuda -log_view
>> -options_left
>> >>>>> The last column of the performance data (from -log_view) will be
>> the percent flops on the GPU. Check that that is > 0.
>> >>>>>
>> >>>>> The end of the output will list the options that were used and
>> options that were _not_ used (if any). Check that there are no options left.
>> >>>>>
>> >>>>> Mark
>> >>>>>
>> >>>>> On Thu, Jun 29, 2023 at 7:50 PM Ng, Cho-Kuen via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >>>>> I installed PETSc on Perlmutter using "spack install
>> petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I
>> compiled the application code (purely CPU code) linking to the petsc
>> package, hoping that I can get performance improvement using the petsc GPU
>> backend. However, the timing was the same using the same number of MPI
>> tasks with and without GPU accelerators. Have I missed something in the
>> process, for example, setting up PETSc options at runtime to use the GPU
>> backend?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Cho
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> >>>>> -- Norbert Wiener
>> >>>>>
>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ 
>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> >>>> -- Norbert Wiener
>> >>>>
>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ 
>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> >>>> -- Norbert Wiener
>> >>>>
>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ 
>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>>
>>
>> <Untitled.png>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96Ertj1RsU$ >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240413/35b683a8/attachment-0001.html>