[petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface
Junchao Zhang
junchao.zhang at gmail.com
Wed Oct 11 09:14:57 CDT 2023
Hi, Philip,
Could you try this branch
jczhang/2023-10-05/feature-support-matshift-aijkokkos ?
Thanks.
--Junchao Zhang
On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip <facklerpw at ornl.gov> wrote:
> Aha! That makes sense. Thank you.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Thursday, October 5, 2023 17:29
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; Blondel, Sophie <
> sblondel at utk.edu>
> *Subject:* [EXTERNAL] Re: [petsc-users] Unexpected performance losses
> switching to COO interface
>
> Wait a moment, it seems it was because we do not have a GPU implementation
> of MatShift...
> Let me see how to add it.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Hi, Philip,
> I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues()
> instead of the COO interface? MatSetValues() needs to copy the data from
> device to host and thus is expensive.
> Do you have profiling results with COO enabled?
>
> [image: Screenshot 2023-10-05 at 10.55.29 AM.png]
>
>
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Hi, Philip,
> I will look into the tarballs and get back to you.
> Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> We finally have xolotl ported to use the new COO interface and the
> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port
> to our previous version (using MatSetValuesStencil and the default Mat and
> Vec implementations), we expected to see an improvement in performance for
> both the "serial" and "cuda" builds (here I'm referring to the kokkos
> configuration).
>
> Attached are two plots that show timings for three different cases. All of
> these were run on Ascent (the Summit-like training system) with 6 MPI tasks
> (on a single node). The CUDA cases were given one GPU per task (and used
> CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases
> we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent
> as possible.
>
> The performance of RHSJacobian (where the bulk of computation happens in
> xolotl) behaved basically as expected (better than expected in the serial
> build). NE_3 case in CUDA was the only one that performed worse, but not
> surprisingly, since its workload for the GPUs is much smaller. We've still
> got more optimization to do on this.
>
> The real surprise was how much worse the overall solve times were. This
> seems to be due simply to switching to the kokkos-based implementation. I'm
> wondering if there are any changes we can make in configuration or runtime
> arguments to help with PETSc's performance here. Any help looking into this
> would be appreciated.
>
> The tarballs linked here
> <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=>
> and here
> <https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=>
> are profiling databases which, once extracted, can be viewed with
> hpcviewer. I don't know how helpful that will be, but hopefully it can give
> you some direction.
>
> Thanks for your help,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/804e4faf/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-05 at 10.55.29?AM.png
Type: image/png
Size: 144341 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/804e4faf/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-05 at 10.55.29?AM.png
Type: image/png
Size: 144341 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231011/804e4faf/attachment-0003.png>
More information about the petsc-users
mailing list