From knepley at gmail.com Mon Apr 1 09:28:34 2024 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 Apr 2024 10:28:34 -0400 Subject: [petsc-users] Error loading data coming from a .hdf5 file into a DMSwarm In-Reply-To: References: Message-ID: On Sun, Mar 31, 2024 at 4:08?PM MIGUEL MOLINOS PEREZ wrote: > Dear all, I am writing a function which store datasets (Vectors) coming > from a DMSwarm structure into a hdf5 file. This step is done nicely > write_function(){ PetscViewerHDF5Open(?) PetscViewerHDF5PushTimestepping(?) > DMSwarmCreateGlobalVectorFromField(?) > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Dear all, > > I am writing a function which store datasets (Vectors) coming from a DMSwarm > structure into a hdf5 file. This step is done nicely > > write_function(){ > PetscViewerHDF5Open(?) > PetscViewerHDF5PushTimestepping(?) > DMSwarmCreateGlobalVectorFromField(?) > VecLoad(?) > DMSwarmDestroyGlobalVectorFromField(?) > } > > The resulting hdf5 file looks good after an inspection using python?s > library h5py. > > However, I am finding difficulties when I try to use this .hdf5 file as a > fresh start for my application. The target field is not properly updated > when I try to load the stored data (it keeps the default one). > > read_function(){ > ? > PetscViewerHDF5Open(?) > PetscViewerHDF5PushTimestepping(?) > DMSwarmCreateGlobalVectorFromField(?) > VecLoad(? ) > DMSwarmDestroyGlobalVectorFromField(?) > ... > } > > The puzzling part is: if I print the ?updated? vector inside of > read_function() using VecView after VecLoad, the vector seem to hold the > updated values. However, If I print the field in the main function after > the call to read_function(), the field remains the same it was before > calling to read_function() and I do not get any erro message. > > It is there something wrong with the logic of my programing? Maybe I am > missing something. > >From the description, my guess is that this is pointer confusion. The vector inside the function is different from the vector outside the function. Thanks, Matt > Thank you in advance. > > Best regards, > Miguel > > Miguel Molinos > Investigador postdoctoral > Juan de la Cierva > Dpto. Mec?nica de Medios Continuos y Teor?a de Estructuras - ETSI > Universidad de Sevilla > Camino de los descubrimientos, s/n > 41092 Sevilla > > > [image: us_logo.jpg] > > > > > > https://urldefense.us/v3/__http://www.us.es__;!!G_uCfscf7eWS!YnltLCiNPOvMtX9m3CBEtxQ-FLDRM3Ef6ZAOt3huF3xEquvYmdHNozTGvZpnCoX8m3gZ_6W9SXKOxFP20r74$ > > Este correo electr?nico y, en su caso, cualquier fichero anexo al > mismo, contiene informaci?n de car?cter confidencial exclusivamente > dirigida a su destinatario o destinatarios. Si no es UD. el > destinatario del mensaje, le ruego lo destruya sin hacer copia digital o > f?sica, comunicando al emisor por esta misma v?a la recepci?n del presente > mensaje. Gracias > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YnltLCiNPOvMtX9m3CBEtxQ-FLDRM3Ef6ZAOt3huF3xEquvYmdHNozTGvZpnCoX8m3gZ_6W9SXKOxFq8o9DM$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: us_logo.jpg Type: image/jpeg Size: 21155 bytes Desc: not available URL: From mmolinos at us.es Mon Apr 1 10:13:39 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Mon, 1 Apr 2024 15:13:39 +0000 Subject: [petsc-users] Error loading data coming from a .hdf5 file into a DMSwarm In-Reply-To: References: Message-ID: Dear Matthew, Thank you for your suggestion. I tried to update the vector with the information coming from the hdf5 file inside the main function. Then I print the vector two times (see the lines below), the first time it has the correct data. However, the second time, it has the same values like I never updated it with VecLoad. It is an alternative way of initialise a vector coming from DMSWarm with previously stored information (keeping the parallel structure)? Miguel //! Load Hdf5 viewer PetscViewer viewer_hdf5; REQUIRE_NOTHROW(PetscViewerHDF5Open(MPI_COMM_WORLD, Output_hdf5_file, FILE_MODE_READ, &viewer_hdf5)); REQUIRE_NOTHROW(PetscViewerHDF5PushTimestepping(viewer_hdf5)); //! Load the vector and fill it with the information from the .hdf5 file Vec stdv_q; REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); REQUIRE_NOTHROW(PetscViewerHDF5PushGroup(viewer_hdf5, "/particle_fields")); REQUIRE_NOTHROW(VecLoad(stdv_q, viewer_hdf5)); REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); REQUIRE_NOTHROW(PetscViewerHDF5PopGroup(viewer_hdf5)); REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); //! Destoy HDF5 context REQUIRE_NOTHROW(PetscViewerDestroy(&viewer_hdf5)); //! Load the vector again and print REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); Best, Miguel Miguel Molinos Investigador postdoctoral Juan de la Cierva Dpto. Mec?nica de Medios Continuos y Teor?a de Estructuras - ETSI Universidad de Sevilla Camino de los descubrimientos, s/n 41092 Sevilla [us_logo.jpg] https://urldefense.us/v3/__http://www.us.es__;!!G_uCfscf7eWS!cBA1ckF1ChOi-Bgd9vd_TbaJmDZbonuQUNk2GOtfDnlYlIyTsO0mIZDspS-H1SopQFNhXZr5udHSOKQPUNQ-Xw$ Este correo electr?nico y, en su caso, cualquier fichero anexo al mismo, contiene informaci?n de car?cter confidencial exclusivamente dirigida a su destinatario o destinatarios. Si no es UD. el destinatario del mensaje, le ruego lo destruya sin hacer copia digital o f?sica, comunicando al emisor por esta misma v?a la recepci?n del presente mensaje. Gracias On 1 Apr 2024, at 16:28, Matthew Knepley wrote: >From the description, my guess is that this is pointer confusion. The vector inside the function is different from the vector outside the function. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: us_logo.jpg Type: image/jpeg Size: 21155 bytes Desc: us_logo.jpg URL: From balay at mcs.anl.gov Mon Apr 1 10:38:31 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 1 Apr 2024 10:38:31 -0500 (CDT) Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> Message-ID: <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> On Sun, 31 Mar 2024, Zongze Yang wrote: > > --- > > petsc at npro petsc % ./configure --download-bison --download-chaco --download-ctetgen --download-eigen --download-fftw --download-hdf5 --download-hpddm --download-hwloc --download-hypre --download-libpng --download-metis --download-mmg --download-mumps --download-netcdf --download-openblas > --download-openblas-make-options="'USE_THREAD=0 USE_LOCKING=1 USE_OPENMP=0'" --download-p4est --download-parmmg --download-pnetcdf --download-pragmatic --download-ptscotch --download-scalapack --download-slepc --download-suitesparse --download-superlu_dist --download-tetgen --download-tri > angle --with-c2html=0 --with-debugging=1 --with-fortran-bindings=0 --with-shared-libraries=1 --with-x=0 --with-zlib --download-openmpi=https://urldefense.us/v3/__https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3rc1.tar.bz2__;!!G_uCfscf7eWS!aCLPUhLfLFG5UwNlUWgGGhXZlw905gJwDd > AryIrltDIXJcdmOP6Is44FzBVrY5ndwzmIqMhI515mnNjTuHoR-tzq$ --download-pastix=https://urldefense.us/v3/__https://web.cels.anl.gov/projects/petsc/download/externalpackages/pastix_5.2.3-p1.tar.bz2__;!!G_uCfscf7eWS!aCLPUhLfLFG5UwNlUWgGGhXZlw905gJwDdAryIrltDIXJcdmOP6Is44FzBVrY5ndwzmIqMhI515mnNjTuA > fJ49xl$ && make && make check > > There's an error encountered during configuration with the above options: > ``` > TESTING: FortranMPICheck from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:676) > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Fortran error! mpi_init() could not be located! > ********************************************************************************************* > ``` > Please refer to the attached file for further information. So I'm getting: >>>>>> *** Fortran compiler checking whether the compiler supports GNU Fortran... yes checking whether gfortran accepts -g... yes checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B checking the name lister (/usr/bin/nm -B) interface... BSD nm checking whether ln -s works... yes checking if Fortran compiler works... yes checking for extra arguments to build a shared library... impossible -- -static checking for gfortran warnings flags... none checking for Fortran flag to compile .f files... none checking for Fortran flag to compile .f90 files... none checking if Fortran compilers preprocess .F90 files without additional flag... yes checking to see if Fortran compilers need additional linker flags... -Wl,-flat_namespace checking external symbol convention... single underscore checking if C and Fortran are link compatible... yes checking to see if Fortran compiler likes the C++ exception flags... skipped (no C++ exceptions flags) checking to see if mpifort compiler needs additional linker flags... none <<<< However you are getting: >>>> *** Fortran compiler checking whether the compiler supports GNU Fortran... yes checking whether gfortran accepts -g... yes checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B checking the name lister (/usr/bin/nm -B) interface... BSD nm checking whether ln -s works... yes checking if Fortran compiler works... yes checking for extra arguments to build a shared library... impossible -- -static checking for gfortran warnings flags... none checking for Fortran flag to compile .f files... none checking for Fortran flag to compile .f90 files... none checking if Fortran compilers preprocess .F90 files without additional flag... yes checking to see if Fortran compilers need additional linker flags... -Wl,-flat_namespace checking external symbol convention... single underscore checking if C and Fortran are link compatible... yes checking to see if Fortran compiler likes the C++ exception flags... skipped (no C++ exceptions flags) checking to see if mpifort compiler needs additional linker flags... -Wl,-commons,use_dylibs <<<< So gfortran [or ld from this newer xcode?] is behaving differently - and openmpi is picking up and using this broken/unsupported option - and likely triggering subsequent errors. >>> ld: warning: -commons use_dylibs is no longer supported, using error treatment instead ld: common symbol '_mpi_fortran_argv_null_' from '/private/var/folders/tf/v4zjvtw12yb3tszk813gmnvw0000gn/T/petsc-xyn64q55/config.libraries/conftest.o' conflicts with definition from dylib '_mpi_fortran_argv_null_' from '/Users/zzyang/workspace/repos/petsc/arch-darwin-c-debug/lib/libmpi_usempif08.40.dylib' <<< Or perhaps openmpi configure is affected by this new warning that this newer xcode spews >>> ld: warning: duplicate -rpath '/opt/homebrew/Cellar/gcc/13.2.0/lib/gcc/current/gcc' ignored <<< I'm not sure what to suggest here [other than using Linux - and avoiding these hassles with MacOS :( ].. Satish From mmolinos at us.es Mon Apr 1 11:06:36 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Mon, 1 Apr 2024 16:06:36 +0000 Subject: [petsc-users] Error loading data coming from a .hdf5 file into a DMSwarm In-Reply-To: References: Message-ID: <8808CAC8-E2A5-4CC4-9F00-DAA8333B7C8E@us.es> Dear Matthew, I came up with a workaround for the problem. I duplicate the vector and use the duplicated copy to read the information from the hdf5 file. Then I swap both vectors and delete the copy. If I invoke VecView outside of the function, the value has been modified properly. However, this solution seems a little bit ?ugly?. I share it just in case someone is facing a similar problem or it can help to understand what is going wrong with my previous implementation. Best, Miguel Vec stdv_q; PetscCall(DMSwarmCreateGlobalVectorFromField(Simulation->atomistic_data, "stdv-q", &stdv_q)); Vec stdv_q_hdf5; PetscCall(VecDuplicate(stdv_q, &stdv_q_hdf5)); PetscCall(PetscObjectSetName((PetscObject)stdv_q_hdf5, "DMSwarmSharedField_stdv-q")); PetscCall(VecLoad(stdv_q_hdf5, viewer_hdf5)); PetscCall(VecSwap(stdv_q, stdv_q_hdf5)); PetscCall(VecDestroy(&stdv_q_hdf5)); PetscCall(DMSwarmDestroyGlobalVectorFromField(Simulation->atomistic_data, "stdv-q", &stdv_q)); On 1 Apr 2024, at 17:13, Miguel Molinos wrote: Dear Matthew, Thank you for your suggestion. I tried to update the vector with the information coming from the hdf5 file inside the main function. Then I print the vector two times (see the lines below), the first time it has the correct data. However, the second time, it has the same values like I never updated it with VecLoad. It is an alternative way of initialise a vector coming from DMSWarm with previously stored information (keeping the parallel structure)? Miguel //! Load Hdf5 viewer PetscViewer viewer_hdf5; REQUIRE_NOTHROW(PetscViewerHDF5Open(MPI_COMM_WORLD, Output_hdf5_file, FILE_MODE_READ, &viewer_hdf5)); REQUIRE_NOTHROW(PetscViewerHDF5PushTimestepping(viewer_hdf5)); //! Load the vector and fill it with the information from the .hdf5 file Vec stdv_q; REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); REQUIRE_NOTHROW(PetscViewerHDF5PushGroup(viewer_hdf5, "/particle_fields")); REQUIRE_NOTHROW(VecLoad(stdv_q, viewer_hdf5)); REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); REQUIRE_NOTHROW(PetscViewerHDF5PopGroup(viewer_hdf5)); REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); //! Destoy HDF5 context REQUIRE_NOTHROW(PetscViewerDestroy(&viewer_hdf5)); //! Load the vector again and print REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); Best, Miguel Miguel Molinos Investigador postdoctoral Juan de la Cierva Dpto. Mec?nica de Medios Continuos y Teor?a de Estructuras - ETSI Universidad de Sevilla Camino de los descubrimientos, s/n 41092 Sevilla https://urldefense.us/v3/__http://www.us.es__;!!G_uCfscf7eWS!dcgV4Ks7of5UCrBcwXoQzq1ZP06FAF8IaHAwnyo5mL5uiBMiKj2mxO7W6odOpn8LRgla6ehj-QCjHK2MbBf_1Q$ Este correo electr?nico y, en su caso, cualquier fichero anexo al mismo, contiene informaci?n de car?cter confidencial exclusivamente dirigida a su destinatario o destinatarios. Si no es UD. el destinatario del mensaje, le ruego lo destruya sin hacer copia digital o f?sica, comunicando al emisor por esta misma v?a la recepci?n del presente mensaje. Gracias On 1 Apr 2024, at 16:28, Matthew Knepley wrote: >From the description, my guess is that this is pointer confusion. The vector inside the function is different from the vector outside the function. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Apr 1 11:32:06 2024 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 Apr 2024 12:32:06 -0400 Subject: [petsc-users] Error loading data coming from a .hdf5 file into a DMSwarm In-Reply-To: References: Message-ID: On Mon, Apr 1, 2024 at 11:13?AM MIGUEL MOLINOS PEREZ wrote: > Dear Matthew, > > Thank you for your suggestion. I tried to update the vector with the > information coming from the hdf5 file inside the main function. Then I > print the vector two times (see the lines below), the first time it has the > correct data. However, the second time, it has the same values like I never > updated it with VecLoad. > > It is an alternative way of initialise a vector coming from DMSWarm with > previously stored information (keeping the parallel structure)? > Oh, you cannot use the Swarm vectors for VecLoad(). They are just a view into the particle data, and that view is destroyed on restore. Swarm data is stored in a particle-like data structure, not in Vecs. If you want to load this Vec, you have to duplicate exactly as you did. This interface is likely to change in the next year to make this problem go away. Thanks, Matt > Miguel > > //! Load Hdf5 viewer > PetscViewer viewer_hdf5; > REQUIRE_NOTHROW(PetscViewerHDF5Open(MPI_COMM_WORLD, Output_hdf5_file, > FILE_MODE_READ, &viewer_hdf5)); > REQUIRE_NOTHROW(PetscViewerHDF5PushTimestepping(viewer_hdf5)); > > //! Load the vector and fill it with the information from the .hdf5 file > Vec stdv_q; > REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( > Simulation.atomistic_data, "stdv-q", &stdv_q)); > > REQUIRE_NOTHROW(PetscViewerHDF5PushGroup(viewer_hdf5, "/particle_fields" > )); > REQUIRE_NOTHROW(VecLoad(stdv_q, viewer_hdf5)); > REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); > REQUIRE_NOTHROW(PetscViewerHDF5PopGroup(viewer_hdf5)); > > REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( > Simulation.atomistic_data, "stdv-q", &stdv_q)); > > //! Destoy HDF5 context > REQUIRE_NOTHROW(PetscViewerDestroy(&viewer_hdf5)); > > //! Load the vector again and print > REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( > Simulation.atomistic_data, "stdv-q", &stdv_q)); > > REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); > > REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( > Simulation.atomistic_data, "stdv-q", &stdv_q)); > > Best, > Miguel > > Miguel Molinos > Investigador postdoctoral > Juan de la Cierva > Dpto. Mec?nica de Medios Continuos y Teor?a de Estructuras - ETSI > Universidad de Sevilla > Camino de los descubrimientos, s/n > 41092 Sevilla > > > [image: us_logo.jpg] > > > > > https://urldefense.us/v3/__http://www.us.es__;!!G_uCfscf7eWS!YcVqv_PA_UdFsx4fZHfzMMaZZ7auMp-poE49ThdEDk0fT7DAXF6-0fOpeWMEOJcrVL5grSHp-QCF8PdvRl51$ > Este correo electr?nico y, en su caso, cualquier fichero anexo al > mismo, contiene informaci?n de car?cter confidencial exclusivamente > dirigida a su destinatario o destinatarios. Si no es UD. el > destinatario del mensaje, le ruego lo destruya sin hacer copia digital o > f?sica, comunicando al emisor por esta misma v?a la recepci?n del presente > mensaje. Gracias > > > > > On 1 Apr 2024, at 16:28, Matthew Knepley wrote: > > From the description, my guess is that this is pointer confusion. The > vector inside the function is different from the vector outside the > function. > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YcVqv_PA_UdFsx4fZHfzMMaZZ7auMp-poE49ThdEDk0fT7DAXF6-0fOpeWMEOJcrVL5grSHp-QCF8EnZVeTC$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: us_logo.jpg Type: image/jpeg Size: 21155 bytes Desc: not available URL: From mmolinos at us.es Mon Apr 1 11:38:52 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Mon, 1 Apr 2024 16:38:52 +0000 Subject: [petsc-users] Error loading data coming from a .hdf5 file into a DMSwarm In-Reply-To: References: Message-ID: <88B6F710-7464-43EB-A125-3D571509467B@us.es> I see. Thank you for the feedback Matthew! Thanks, Miguel On 1 Apr 2024, at 18:32, Matthew Knepley wrote: On Mon, Apr 1, 2024 at 11:13?AM MIGUEL MOLINOS PEREZ > wrote: Dear Matthew, Thank you for your suggestion. I tried to update the vector with the information coming from the hdf5 file inside the main function. Then I print the vector two times (see the lines below), the first time it has the correct data. However, the second time, it has the same values like I never updated it with VecLoad. It is an alternative way of initialise a vector coming from DMSWarm with previously stored information (keeping the parallel structure)? Oh, you cannot use the Swarm vectors for VecLoad(). They are just a view into the particle data, and that view is destroyed on restore. Swarm data is stored in a particle-like data structure, not in Vecs. If you want to load this Vec, you have to duplicate exactly as you did. This interface is likely to change in the next year to make this problem go away. Thanks, Matt Miguel //! Load Hdf5 viewer PetscViewer viewer_hdf5; REQUIRE_NOTHROW(PetscViewerHDF5Open(MPI_COMM_WORLD, Output_hdf5_file, FILE_MODE_READ, &viewer_hdf5)); REQUIRE_NOTHROW(PetscViewerHDF5PushTimestepping(viewer_hdf5)); //! Load the vector and fill it with the information from the .hdf5 file Vec stdv_q; REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); REQUIRE_NOTHROW(PetscViewerHDF5PushGroup(viewer_hdf5, "/particle_fields")); REQUIRE_NOTHROW(VecLoad(stdv_q, viewer_hdf5)); REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); REQUIRE_NOTHROW(PetscViewerHDF5PopGroup(viewer_hdf5)); REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); //! Destoy HDF5 context REQUIRE_NOTHROW(PetscViewerDestroy(&viewer_hdf5)); //! Load the vector again and print REQUIRE_NOTHROW(DMSwarmCreateGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); REQUIRE_NOTHROW(VecView(stdv_q, PETSC_VIEWER_STDOUT_WORLD)); REQUIRE_NOTHROW(DMSwarmDestroyGlobalVectorFromField( Simulation.atomistic_data, "stdv-q", &stdv_q)); Best, Miguel Miguel Molinos Investigador postdoctoral Juan de la Cierva Dpto. Mec?nica de Medios Continuos y Teor?a de Estructuras - ETSI Universidad de Sevilla Camino de los descubrimientos, s/n 41092 Sevilla https://urldefense.us/v3/__http://www.us.es__;!!G_uCfscf7eWS!Z_MoZ7Bj4eyIXKal80C357h4aO105lFAgZwG6a6xZwNKsKVTzZ4HfUspESkt1TKg2MOv5bbWwmDB8Gsd-nJ2ng$ Este correo electr?nico y, en su caso, cualquier fichero anexo al mismo, contiene informaci?n de car?cter confidencial exclusivamente dirigida a su destinatario o destinatarios. Si no es UD. el destinatario del mensaje, le ruego lo destruya sin hacer copia digital o f?sica, comunicando al emisor por esta misma v?a la recepci?n del presente mensaje. Gracias On 1 Apr 2024, at 16:28, Matthew Knepley > wrote: >From the description, my guess is that this is pointer confusion. The vector inside the function is different from the vector outside the function. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Z_MoZ7Bj4eyIXKal80C357h4aO105lFAgZwG6a6xZwNKsKVTzZ4HfUspESkt1TKg2MOv5bbWwmDB8Gs0StP1_g$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Mon Apr 1 11:52:40 2024 From: yangzongze at gmail.com (Zongze Yang) Date: Tue, 2 Apr 2024 00:52:40 +0800 Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> Message-ID: An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Apr 1 12:15:09 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 1 Apr 2024 12:15:09 -0500 (CDT) Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> Message-ID: <4b225a01-e47f-a0f9-9681-8d0b981cf44e@mcs.anl.gov> On Mon, 1 Apr 2024, Zongze Yang wrote: > > I noticed this in the config.log of OpenMPI: > ``` > configure:30230: checking to see if mpifort compiler needs additional linker flags > configure:30247: gfortran -o conftest -fPIC -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -g -O0 -fallow-argument-mismatch -Wl,-flat_namespace -Wl,-commons,use_dylibs conftest.f90 >&5 > ld: warning: -commons use_dylibs is no longer supported, using error treatment instead > configure:30247: $? = 0 > configure:30299: result: -Wl,-commons,use_dylibs > ``` > So, I find it odd that this flag isn't picked up on your platform, as it only checked the exit value. I get: configure:30247: gfortran -o conftest -fPIC -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -g -O0 -fallow-argument-mismatch -Wl,-flat_namespace -Wl,-commons,use_dylibs conftest.f90 >&5 ld: unknown options: -commons collect2: error: ld returned 1 exit status configure:30247: $? = 1 configure: failed program was: | program test | integer :: i | end program configure:30299: result: none Note, I have and older xcode-15/CLT version: petsc at npro ~ % clang --version Apple clang version 15.0.0 (clang-1500.1.0.2.5) Target: arm64-apple-darwin23.3.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin Satish From yangzongze at gmail.com Mon Apr 1 12:43:16 2024 From: yangzongze at gmail.com (Zongze Yang) Date: Tue, 2 Apr 2024 01:43:16 +0800 Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: <4b225a01-e47f-a0f9-9681-8d0b981cf44e@mcs.anl.gov> References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> <4b225a01-e47f-a0f9-9681-8d0b981cf44e@mcs.anl.gov> Message-ID: <22EA03F7-81B1-4E72-87DB-C5E3DD10DF22@gmail.com> Thank you for your update. I found some links that suggest this issue is related to the Apple linker, which is causing problems with Fortran linking. 1. https://urldefense.us/v3/__https://github.com/open-mpi/ompi/issues/12427__;!!G_uCfscf7eWS!bHY4uqpTfwl0jKopATQs3gw--TZSBmDp0Lb1gzDBeEu4ZB_zTph-jfw49yIr3jvPx0YEhQbk1PjYbGYVpjjms6Wb$ 2. https://urldefense.us/v3/__https://x.com/science_dot/status/1768667417553547635?s=46__;!!G_uCfscf7eWS!bHY4uqpTfwl0jKopATQs3gw--TZSBmDp0Lb1gzDBeEu4ZB_zTph-jfw49yIr3jvPx0YEhQbk1PjYbGYVptASYXS2$ Best wishes, Zongze > On 2 Apr 2024, at 01:15, Satish Balay wrote: > > On Mon, 1 Apr 2024, Zongze Yang wrote: > >> >> I noticed this in the config.log of OpenMPI: >> ``` >> configure:30230: checking to see if mpifort compiler needs additional linker flags >> configure:30247: gfortran -o conftest -fPIC -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -g -O0 -fallow-argument-mismatch -Wl,-flat_namespace -Wl,-commons,use_dylibs conftest.f90 >&5 >> ld: warning: -commons use_dylibs is no longer supported, using error treatment instead >> configure:30247: $? = 0 >> configure:30299: result: -Wl,-commons,use_dylibs >> ``` >> So, I find it odd that this flag isn't picked up on your platform, as it only checked the exit value. > > I get: > > configure:30247: gfortran -o conftest -fPIC -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -g -O0 -fallow-argument-mismatch -Wl,-flat_namespace -Wl,-commons,use_dylibs conftest.f90 >&5 > ld: unknown options: -commons > collect2: error: ld returned 1 exit status > configure:30247: $? = 1 > configure: failed program was: > | program test > | integer :: i > | end program > configure:30299: result: none > > Note, I have and older xcode-15/CLT version: > > petsc at npro ~ % clang --version > Apple clang version 15.0.0 (clang-1500.1.0.2.5) > Target: arm64-apple-darwin23.3.0 > Thread model: posix > InstalledDir: /Library/Developer/CommandLineTools/usr/bin > > Satish -------------- next part -------------- An HTML attachment was scrubbed... URL: From sawsan.shatanawi at wsu.edu Mon Apr 1 12:57:22 2024 From: sawsan.shatanawi at wsu.edu (Shatanawi, Sawsan Muhammad) Date: Mon, 1 Apr 2024 17:57:22 +0000 Subject: [petsc-users] PETSC Matrix debugging Message-ID: Hello everyone, I hope this email finds you well. Is there a way we can check how the matrix looks like after setting it. I have tried debugging it with gdb- break points- and print statements, but it only gave me one value instead of a matrix. Thank you in advance for your time and assistance. Best regards, Sawsan -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Apr 1 13:18:33 2024 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 Apr 2024 14:18:33 -0400 Subject: [petsc-users] PETSC Matrix debugging In-Reply-To: References: Message-ID: On Mon, Apr 1, 2024 at 1:57?PM Shatanawi, Sawsan Muhammad via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello everyone, I hope this email finds you well. Is there a way we can > check how the matrix looks like after setting it. I have tried debugging it > with gdb- break points- and print statements, but it only gave me one value > instead of a matrix. > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Hello everyone, > > I hope this email finds you well. > > Is there a way we can check how the matrix looks like after setting it. > I have tried debugging it with gdb- break points- and print statements, > but it only gave me one value instead of a matrix. > > Thank you in advance for your time and assistance. > > I usually use MatView(), which can print to the screen. Is that what you want? Thanks, Matt > Best regards, > > Sawsan > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cFl9o86vmX7ZWINcjhpIy1gzc938wF-1bgwtrPGwm8zQQ71WUgVstEpSos7o1OWqfmTup5bgaekGgfUuG7aF$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Apr 1 15:14:05 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 1 Apr 2024 16:14:05 -0400 Subject: [petsc-users] PETSC Matrix debugging In-Reply-To: References: Message-ID: <32F759ED-DA01-413A-9605-5134757D2E6E@petsc.dev> Note, you can also run with the option -mat_view and it will print each matrix that gets assembled. Also in the debugger you can do call MatView(mat,0) > On Apr 1, 2024, at 2:18?PM, Matthew Knepley wrote: > > This Message Is From an External Sender > This message came from outside your organization. > On Mon, Apr 1, 2024 at 1:57?PM Shatanawi, Sawsan Muhammad via petsc-users > wrote: >> This Message Is From an External Sender >> This message came from outside your organization. >> >> Hello everyone, >> >> I hope this email finds you well. >> >> Is there a way we can check how the matrix looks like after setting it. >> I have tried debugging it with gdb- break points- and print statements, but it only gave me one value instead of a matrix. >> >> Thank you in advance for your time and assistance. >> > I usually use MatView(), which can print to the screen. Is that what you want? > > Thanks, > > Matt >> Best regards, >> >> Sawsan >> >> > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bT1jdavVG8lGjxjhujAttmcaK9R1GFUxJtuFl1S2JK74c0mhrCwc2DkQippCFh8qwrk_9x5Dxjv-2H967RRgQPA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Apr 1 15:14:56 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 1 Apr 2024 15:14:56 -0500 (CDT) Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: <22EA03F7-81B1-4E72-87DB-C5E3DD10DF22@gmail.com> References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> <4b225a01-e47f-a0f9-9681-8d0b981cf44e@mcs.anl.gov> <22EA03F7-81B1-4E72-87DB-C5E3DD10DF22@gmail.com> Message-ID: On Mon, 1 Apr 2024, Zongze Yang wrote: > Thank you for your update. > > I found some links that suggest this issue is related to the Apple linker, which is causing problems with Fortran linking. > > 1. https://urldefense.us/v3/__https://github.com/open-mpi/ompi/issues/12427__;!!G_uCfscf7eWS!bHY4uqpTfwl0jKopATQs3gw--TZSBmDp0Lb1gzDBeEu4ZB_zTph-jfw49yIr3jvPx0YEhQbk1PjYbGYVpjjms6Wb$ > 2. https://urldefense.us/v3/__https://x.com/science_dot/status/1768667417553547635?s=46__;!!G_uCfscf7eWS!bHY4uqpTfwl0jKopATQs3gw--TZSBmDp0Lb1gzDBeEu4ZB_zTph-jfw49yIr3jvPx0YEhQbk1PjYbGYVptASYXS2$ https://urldefense.us/v3/__https://github.com/Homebrew/homebrew-core/issues/162714__;!!G_uCfscf7eWS!chekWa3R3JhHB1MVv33Oqj9fPFhbnx9sm7cm7Lk5-n7PHicsVkY0n7XoSkWmk259VLNjTzEus6xBpRL20MmLo1g$ recommends "downgrade CLT (or xcode?) to 15.1" Satish From sawsan.shatanawi at wsu.edu Mon Apr 1 15:40:01 2024 From: sawsan.shatanawi at wsu.edu (Shatanawi, Sawsan Muhammad) Date: Mon, 1 Apr 2024 20:40:01 +0000 Subject: [petsc-users] PETSC Matrix debugging In-Reply-To: <32F759ED-DA01-413A-9605-5134757D2E6E@petsc.dev> References: <32F759ED-DA01-413A-9605-5134757D2E6E@petsc.dev> Message-ID: Hello Barry and Matthew, Yes MatView is what I was looking for. Thank you for your help. Bests, Sawsan ________________________________ From: Barry Smith Sent: Monday, April 1, 2024 1:14 PM To: Matthew Knepley Cc: Shatanawi, Sawsan Muhammad ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSC Matrix debugging [EXTERNAL EMAIL] Note, you can also run with the option -mat_view and it will print each matrix that gets assembled. Also in the debugger you can do call MatView(mat,0) On Apr 1, 2024, at 2:18?PM, Matthew Knepley wrote: This Message Is From an External Sender This message came from outside your organization. On Mon, Apr 1, 2024 at 1:57?PM Shatanawi, Sawsan Muhammad via petsc-users > wrote: This Message Is From an External Sender This message came from outside your organization. Hello everyone, I hope this email finds you well. Is there a way we can check how the matrix looks like after setting it. I have tried debugging it with gdb- break points- and print statements, but it only gave me one value instead of a matrix. Thank you in advance for your time and assistance. I usually use MatView(), which can print to the screen. Is that what you want? Thanks, Matt Best regards, Sawsan -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dzUer79uIA6hcNT7qaG9AYDGyzDSYNgriB1CbMtHqSvf4Yo-IpKL7NBCWNtAUNcE4xbHLqQPsngdWk6kYcVImGviELhswASejA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Tue Apr 2 01:10:19 2024 From: yangzongze at gmail.com (Zongze Yang) Date: Tue, 2 Apr 2024 14:10:19 +0800 Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> <4b225a01-e47f-a0f9-9681-8d0b981cf44e@mcs.anl.gov> <22EA03F7-81B1-4E72-87DB-C5E3DD10DF22@gmail.com> Message-ID: <28F897B4-BA64-4895-B4EF-485AFC8D6C10@gmail.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure-and-make.tar.gz Type: application/x-gzip Size: 715890 bytes Desc: not available URL: -------------- next part -------------- > On 2 Apr 2024, at 04:14, Satish Balay wrote: > > On Mon, 1 Apr 2024, Zongze Yang wrote: > >> Thank you for your update. >> >> I found some links that suggest this issue is related to the Apple linker, which is causing problems with Fortran linking. >> >> 1. https://urldefense.us/v3/__https://github.com/open-mpi/ompi/issues/12427__;!!G_uCfscf7eWS!bHY4uqpTfwl0jKopATQs3gw--TZSBmDp0Lb1gzDBeEu4ZB_zTph-jfw49yIr3jvPx0YEhQbk1PjYbGYVpjjms6Wb$ >> 2. https://urldefense.us/v3/__https://x.com/science_dot/status/1768667417553547635?s=46__;!!G_uCfscf7eWS!bHY4uqpTfwl0jKopATQs3gw--TZSBmDp0Lb1gzDBeEu4ZB_zTph-jfw49yIr3jvPx0YEhQbk1PjYbGYVptASYXS2$ > > https://urldefense.us/v3/__https://github.com/Homebrew/homebrew-core/issues/162714__;!!G_uCfscf7eWS!eGiVH2meEkLSEHvkY6Y-m7U1wPG4ZDxHod7lLZI3HTu6itzNEDm7n3cz4GNly925EEHvVRnyNQYn2aAt0Z1p53tZ$ recommends "downgrade CLT (or xcode?) to 15.1" > > Satish From dontbugthedevs at proton.me Tue Apr 2 03:42:28 2024 From: dontbugthedevs at proton.me (Noam T.) Date: Tue, 02 Apr 2024 08:42:28 +0000 Subject: [petsc-users] FE Tabulation values In-Reply-To: References: Message-ID: Thank you for the clarification. Are there references specifically for this tabulation method and its construction? I have seen some references about the "FIAT" algorithm, but from a quick look I could not find all details. --- On a related note, I stated the values of Nq, Nc and Nb, as they can be checked. But to be sure; for the given 2D example: - Nc = 2 refers to the two compoents as in x/y in 2D - Nb = 3 * 2 i.e. 3 shape functions (or nodes) times 2 components Testing with a 3D mesh (e.g. a 4-node linear tetrahedron), Nc = 3 and Nb = 12, so the same math seems to work, but perhaps there is a different idea behind it. Thanks. Noam On Tuesday, March 26th, 2024 at 11:17 PM, Matthew Knepley wrote: > On Tue, Mar 26, 2024 at 2:23?PM Noam T. via petsc-users wrote: > >> Hello, I am trying to understand the FE Tabulation data obtained from e.?g . PetscFEComputeTabulation. Using a 2D mesh with a single triangle, first order, with vertices (0,0), (0,1), (1,0) (see msh file attached), and a single quadrature point >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> Hello, >> >> I am trying to understand the FE Tabulation data obtained from e.g . PetscFEComputeTabulation. Using a 2D mesh with a single triangle, first order, with vertices (0,0), (0,1), (1,0) (see msh file attached), and a single quadrature point at (1/3, 1/3), one gets Nb = 6, Nc = 2, Nq = 1, and the arrays for the basis and first derivatives are of sizes [Nq x Nb x Nc] = 12 and[Nq x Nb x Nc x dim] = 24, respectively > > The tabulations from PetscFE are recorded on the reference cell. For triangles, the reference cell is > (-1, -1) -- (1, -1) -- (-1, 1). The linear basis functions at these nodes are > > phi_0: -(x + y) / 2 > phi_1: (x + 1) / 2 > phi_2: (y + 1) / 2 > > and then you use the tensor product for Nc = 2. > > / phi_0 \ / 0 \ etc. > \ 0 / \ phi_0 / > >> The values of these two arrays are: >> basis (T->T[0]) >> [-1/3, 0, 0, -1/3, 2/3, 0, >> 0, 2/3, 2/3, 0, 0, 2/3] > > So these values are indeed the evaluations of those basis functions at (1/3, 1/3). The derivatives are similar. > > These are the evaluations you want if you are integrating in reference space, as we do for the finite element integrals, and also the only way we could use a single tabulation for the mesh. > > Thanks, > > Matt > >> deriv (T->T[1]) >> [-1/2, -1/2, 0, 0, 0, 0, >> -1/2, -1/2, 1/2, 0, 0, 0, >> 0, 0, 1/2, 0, 0, 1/2, >> 0, 0, 0, 0, 0, 1/2] >> >> How does one get these values? I can't quite find a way to relate them to evaluating the basis functions of a P1 triangle in the given quadrature point. >> >> Thanks, >> Noam > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > [https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/*(http:/*www.cse.buffalo.edu/*knepley/)__;fl0vfg!!G_uCfscf7eWS!d_p8b8_VRd0sM4TjKw7CDB2HbOn9JfMTaOrTqoazk2UvhfJwWRwR8i6S7cmZrXu_arDmY1F-Oyq1J5v30G8ZxjPmwqVxUmKq$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Apr 2 07:11:15 2024 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 Apr 2024 08:11:15 -0400 Subject: [petsc-users] FE Tabulation values In-Reply-To: References: Message-ID: On Tue, Apr 2, 2024 at 4:42?AM Noam T. wrote: > Thank you for the clarification. > > Are there references specifically for this tabulation method and its > construction? I have seen some references about the "FIAT" algorithm, but > from a quick look I could not find all details. > The basis for tabulation is this algorithm: https://urldefense.us/v3/__https://dl.acm.org/doi/10.1145/1039813.1039820__;!!G_uCfscf7eWS!f6jEiH1CuazpOliQttQfGv2Idl6TtDFefOQi3LMsOc1x75ElNLSY-l0BBvHSJhRf_B0HkkN0nvj0yEAhY_eJ$ > --- > > On a related note, I stated the values of Nq, Nc and Nb, as they can be > checked. But to be sure; for the given 2D example: > > - Nc = 2 refers to the two compoents as in x/y in 2D > > It means that you have a vector quantity here. > > - > > > - Nb = 3 * 2 i.e. 3 shape functions (or nodes) times 2 components > > Yes. I am explicitly representing the tensor product structure, since sometimes you do not have a tensor product and I wanted to be general. > Testing with a 3D mesh (e.g. a 4-node linear tetrahedron), Nc = 3 and Nb = > 12, so the same math seems to work, but perhaps there is a different idea > behind it. > Yes, that is right. Thanks, Matt > Thanks. > Noam > On Tuesday, March 26th, 2024 at 11:17 PM, Matthew Knepley < > knepley at gmail.com> wrote: > > On Tue, Mar 26, 2024 at 2:23?PM Noam T. via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hello, I am trying to understand the FE Tabulation data obtained from e. >> g . PetscFEComputeTabulation. Using a 2D mesh with a single triangle, first >> order, with vertices (0,0), (0,1), (1,0) (see msh file attached), and a >> single quadrature point >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> ZjQcmQRYFpfptBannerEnd >> Hello, >> >> I am trying to understand the FE Tabulation data obtained from e.g . >> PetscFEComputeTabulation. Using a 2D mesh with a single triangle, first >> order, with vertices (0,0), (0,1), (1,0) (see msh file attached), and a >> single quadrature point at (1/3, 1/3), one gets Nb = 6, Nc = 2, Nq = 1, and >> the arrays for the basis and first derivatives are of sizes [Nq x Nb x Nc] >> = 12 and[Nq x Nb x Nc x dim] = 24, respectively >> > > The tabulations from PetscFE are recorded on the reference cell. For > triangles, the reference cell is > (-1, -1) -- (1, -1) -- (-1, 1). The linear basis functions at these nodes > are > > phi_0: -(x + y) / 2 > phi_1: (x + 1) / 2 > phi_2: (y + 1) / 2 > > and then you use the tensor product for Nc = 2. > > / phi_0 \ / 0 \ etc. > \ 0 / \ phi_0 / > > The values of these two arrays are: >> basis (T->T[0]) >> [-1/3, 0, 0, -1/3, 2/3, 0, >> 0, 2/3, 2/3, 0, 0, 2/3] >> > > So these values are indeed the evaluations of those basis functions at > (1/3, 1/3). The derivatives are similar. > > These are the evaluations you want if you are integrating in reference > space, as we do for the finite element integrals, and also the only way we > could use a single tabulation for the mesh. > > Thanks, > > Matt > >> deriv (T->T[1]) >> [-1/2, -1/2, 0, 0, 0, 0, >> -1/2, -1/2, 1/2, 0, 0, 0, >> 0, 0, 1/2, 0, 0, 1/2, >> 0, 0, 0, 0, 0, 1/2] >> >> How does one get these values? I can't quite find a way to relate them to >> evaluating the basis functions of a P1 triangle in the given quadrature >> point. >> >> Thanks, >> Noam >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f6jEiH1CuazpOliQttQfGv2Idl6TtDFefOQi3LMsOc1x75ElNLSY-l0BBvHSJhRf_B0HkkN0nvj0yC6i1AIh$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f6jEiH1CuazpOliQttQfGv2Idl6TtDFefOQi3LMsOc1x75ElNLSY-l0BBvHSJhRf_B0HkkN0nvj0yC6i1AIh$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From FERRANJ2 at my.erau.edu Wed Apr 3 13:17:05 2024 From: FERRANJ2 at my.erau.edu (Ferrand, Jesus A.) Date: Wed, 3 Apr 2024 18:17:05 +0000 Subject: [petsc-users] Correct way to set/track global numberings in DMPlex? Message-ID: Dear PETSc team: (Hoping to get a hold of Jed and/or Matt for this one) (Also, sorry for the mouthful below) I'm developing routines that will read/write CGNS files to DMPlex and vice versa. One of the recurring challenges is the bookkeeping of global numbering for vertices and cells. Currently, I am restricting my support to single Zone CGNS files, in which the file provides global numbers for vertices and cells. I used PetscHSetI as exemplified in DMPlexBuildFromCellListParallel() to obtain local DAG numbers from the global numbers provided by the file. I also used PetscSFCreateByMatchingIndices() to establish a basic DAG point distribution over the MPI processes. I use this PointSF to manually assemble a global PetscSection. For owned DAG points (per the PointSF) , I call PetscSectionSetOffset(section, point, file_offset); For ghost DAG points (per the PointSF) I call PetscSectionSetOffset(section, point, -(file_offset + 1)); All of what I have just described happens in my CGNS version of DMPlexTopologyLoad(). My intention is to retain those numbers into the DMPlex, and reuse them in my CGNS analogues of DMPlexCoordinatesLoad(), DMPlexLabelsLoad(), and DMPlexGlobalVectorLoad(). Anyhow, is this a good wait to track global numbers? Also, I need (for other applications) to eventually call DMPlexInterpolate() and DMPlexDistribute(), will the global PetscSection offsets be preserved after calling those two? Sincerely: J.A. Ferrand Embry-Riddle Aeronautical University - Daytona Beach - FL Ph.D. Candidate, Aerospace Engineering M.Sc. Aerospace Engineering B.Sc. Aerospace Engineering B.Sc. Computational Mathematics Phone: (386)-843-1829 Email(s): ferranj2 at my.erau.edu jesus.ferrand at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 3 21:55:11 2024 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 3 Apr 2024 22:55:11 -0400 Subject: [petsc-users] Correct way to set/track global numberings in DMPlex? In-Reply-To: References: Message-ID: On Wed, Apr 3, 2024 at 2:28?PM Ferrand, Jesus A. wrote: > Dear PETSc team: (Hoping to get a hold of Jed and/or Matt for this one) > (Also, sorry for the mouthful below) I'm developing routines that will > read/write CGNS files to DMPlex and vice versa. One of the recurring > challenges is the bookkeeping > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Dear PETSc team: > > (Hoping to get a hold of Jed and/or Matt for this one) > (Also, sorry for the mouthful below) > > I'm developing routines that will read/write CGNS files to DMPlex and vice > versa. > One of the recurring challenges is the bookkeeping of global numbering for > vertices and cells. > Currently, I am restricting my support to single Zone CGNS files, in which > the file provides global numbers for vertices and cells. > I thought Jed had put in parallel CGNS loading. If so, maybe you can transition to that. If not, we should get your stuff integrated. > I used PetscHSetI as exemplified in DMPlexBuildFromCellListParallel() to > obtain local DAG numbers from the global numbers provided by the file. > I also used PetscSFCreateByMatchingIndices() to establish a basic DAG > point distribution over the MPI processes. > I use this PointSF to manually assemble a global PetscSection. > For owned DAG points (per the PointSF) , I call > PetscSectionSetOffset(section, point, file_offset); > For ghost DAG points (per the PointSF) I call > PetscSectionSetOffset(section, point, -(file_offset + 1)); > This sounds alright to me, although I admit to not understanding exactly what is being done. All of what I have just described happens in my CGNS version of > DMPlexTopologyLoad(). > My intention is to retain those numbers into the DMPlex, and reuse them in > my CGNS analogues of DMPlexCoordinatesLoad(), DMPlexLabelsLoad(), and > DMPlexGlobalVectorLoad(). > Anyhow, is this a good wait to track global numbers? > The way I think about it, for example in DMPlexBuildFromCellListParallel(), you load all parallel things in blocks (cell adjacency, vertex coordinates, etc). Then if you have to redistribute afterwards, you make a PetscSF to do it. I first make one mapping points to points. With a PetscSection, you can easily convert this into dofs to dofs. For example, we load sets of vertices, but we want vertices distributed as they are attached to cells. So we create a PetscSF mapping uniform blocks to the division attached to cells. Then we use the PetscSection for coordinates to make a new PetscSF and redistribute coordinates. Thanks, Matt > Also, I need (for other applications) to eventually call > DMPlexInterpolate() and DMPlexDistribute(), will the global PetscSection > offsets be preserved after calling those two? > > > Sincerely: > > *J.A. Ferrand* > > Embry-Riddle Aeronautical University - Daytona Beach - FL > Ph.D. Candidate, Aerospace Engineering > > M.Sc. Aerospace Engineering > > B.Sc. Aerospace Engineering > > B.Sc. Computational Mathematics > > > *Phone:* (386)-843-1829 > > *Email(s):* ferranj2 at my.erau.edu > > jesus.ferrand at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Z30NkeNY4hgcjCs5RtVH3AgiBI0E4BBkGDPYdLNB10LWOF050wW1AXJDMcOtZ0G3u9nPiKrc0MX9YIYzdUyK$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jed.Brown at Colorado.EDU Wed Apr 3 23:12:37 2024 From: Jed.Brown at Colorado.EDU (Jed Brown) Date: Wed, 03 Apr 2024 22:12:37 -0600 Subject: [petsc-users] Correct way to set/track global numberings in DMPlex? In-Reply-To: References: Message-ID: <8734s1n4ui.fsf@jedbrown.org> An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Apr 3 23:44:12 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 3 Apr 2024 23:44:12 -0500 (CDT) Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: <28F897B4-BA64-4895-B4EF-485AFC8D6C10@gmail.com> References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> <4b225a01-e47f-a0f9-9681-8d0b981cf44e@mcs.anl.gov> <22EA03F7-81B1-4E72-87DB-C5E3DD10DF22@gmail.com> <28F897B4-BA64-4895-B4EF-485AFC8D6C10@gmail.com> Message-ID: <5e8a0a57-ac39-f6bf-0fb6-bccfbdf18bd1@mcs.anl.gov> With xcode-15.3 and branch "barry/2024-04-03/fix-chaco-modern-c/release" from https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7433__;!!G_uCfscf7eWS!YJPSyG4qeGbCKYRp9y16HJgjw7AOrQ0mL0QWb_XcKYZ17UwK2GtURGMpkyi4TctAY-8XqSvQUFmyCQFNnKy75fI$ [and a patched openmpi tarball to remove -Wl,-commons,use_dylibs] the following works for me. Satish ---- petsc at mpro petsc.x % ./configure --download-bison --download-chaco --download-ctetgen --download-eigen --download-fftw --download-hdf5 --download-hpddm --download-hwloc --download-hypre --download-libpng --download-metis --download-mmg --download-mumps --download-netcdf --download-openblas --download-openblas-make-options="'USE_THREAD=0 USE_LOCKING=1 USE_OPENMP=0'" --download-p4est --download-parmmg --download-pnetcdf --download-pragmatic --download-ptscotch --download-scalapack --download-slepc --download-suitesparse --download-superlu_dist --download-tetgen --download-triangle --with-c2html=0 --with-debugging=1 --with-fortran-bindings=0 --with-shared-libraries=1 --with-x=0 --with-zlib --download-openmpi=https://urldefense.us/v3/__https://web.cels.anl.gov/projects/petsc/download/externalpackages/openmpi-5.0.2-xcode15.tar.gz__;!!G_uCfscf7eWS!YJPSyG4qeGbCKYRp9y16HJgjw7AOrQ0mL0QWb_XcKYZ17UwK2GtURGMpkyi4TctAY-8XqSvQUFmyCQFNvoG1gVM$ --download-pastix && make && make check CC arch-darwin-c-debug/obj/src/lme/interface/lmesolve.o CLINKER arch-darwin-c-debug/lib/libslepc.3.21.0.dylib DSYMUTIL arch-darwin-c-debug/lib/libslepc.3.21.0.dylib Now to install the library do: make SLEPC_DIR=/Users/petsc/petsc.x/arch-darwin-c-debug/externalpackages/git.slepc PETSC_DIR=/Users/petsc/petsc.x install ========================================= *** Installing SLEPc *** *** Installing SLEPc at prefix location: /Users/petsc/petsc.x/arch-darwin-c-debug *** ==================================== Install complete. Now to check if the libraries are working do (in current directory): make SLEPC_DIR=/Users/petsc/petsc.x/arch-darwin-c-debug PETSC_DIR=/Users/petsc/petsc.x PETSC_ARCH=arch-darwin-c-debug check ==================================== /usr/bin/make --no-print-directory -f makefile PETSC_ARCH=arch-darwin-c-debug PETSC_DIR=/Users/petsc/petsc.x SLEPC_DIR=/Users/petsc/petsc.x/arch-darwin-c-debug/externalpackages/git.slepc install-builtafterslepc /usr/bin/make --no-print-directory -f makefile PETSC_ARCH=arch-darwin-c-debug PETSC_DIR=/Users/petsc/petsc.x SLEPC_DIR=/Users/petsc/petsc.x/arch-darwin-c-debug/externalpackages/git.slepc slepc4py-install make[6]: Nothing to be done for `slepc4py-install'. *** Building and installing HPDDM *** ========================================= Now to check if the libraries are working do: make PETSC_DIR=/Users/petsc/petsc.x PETSC_ARCH=arch-darwin-c-debug check ========================================= Running PETSc check examples to verify correct installation Using PETSC_DIR=/Users/petsc/petsc.x and PETSC_ARCH=arch-darwin-c-debug C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes C/C++ example src/snes/tutorials/ex19 run successfully with HYPRE C/C++ example src/snes/tutorials/ex19 run successfully with MUMPS C/C++ example src/snes/tutorials/ex19 run successfully with SuiteSparse C/C++ example src/snes/tutorials/ex19 run successfully with SuperLU_DIST C/C++ example src/vec/vec/tests/ex47 run successfully with HDF5 Running SLEPc check examples to verify correct installation Using SLEPC_DIR=/Users/petsc/petsc.x/arch-darwin-c-debug/externalpackages/git.slepc, PETSC_DIR=/Users/petsc/petsc.x, and PETSC_ARCH=arch-darwin-c-debug C/C++ example src/eps/tests/test10 run successfully with 1 MPI process C/C++ example src/eps/tests/test10 run successfully with 2 MPI processes Completed SLEPc check examples Completed PETSc check examples petsc at mpro petsc.x % clang --version Apple clang version 15.0.0 (clang-1500.3.9.4) Target: arm64-apple-darwin23.4.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin petsc at mpro petsc.x % On Tue, 2 Apr 2024, Zongze Yang wrote: > Thank you for the suggestion. > > I'd like to share some test results using the current Xcode. When I added the flag `LDFLAGS=-Wl,-ld_classic` and configured PETSc with OpenMPI, the tests with the latest Xcode seemed okay, except for some link warnings. The configure > command is > ``` > ./configure \ > PETSC_ARCH=arch-darwin-c-debug-openmpi \ > LDFLAGS=-Wl,-ld_classic \ > --download-openmpi=https://urldefense.us/v3/__https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.3rc1.tar.bz2__;!!G_uCfscf7eWS!eGiVH2meEkLSEHvkY6Y-m7U1wPG4ZDxHod7lLZI3HTu6itzNEDm7n3cz4GNly925EEHvVRnyNQYn2aAt0ewiXz99$ > \ > --download-mumps --download-scalapack \ > --with-clean \ > && make && make check > ``` From bramkamp at nsc.liu.se Thu Apr 4 08:37:42 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Thu, 4 Apr 2024 15:37:42 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC Message-ID: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> Dear PETSC Team, I found the following problem: I compile petsc 3.20.5 with Nvidia compiler 23.7. I use a pretty standard configuration, including --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 --download-fblaslapack --with-cuda=0 I exclude cuda, since I was not sure if the problem was cuda related. The problem is now, if I have s simple fortran program where I link the petsc library, but I actually do not use petsc in that program (Just for testing). I want to use OpenACC directives in my program, e.g. !$acc parallel loop . The problem is now, as soon I link with the petsc library, the openacc commands do not work anymore. It seems that openacc is not initialised and hence it cannot find a GPU. The problem seems that you link with -lnvc. In ?petscvariables? => PETSC_WITH_EXTERNAL_LIB you include ?-lnvc?. If I take this out, then openacc works. With ?-lnvc? something gets messed up. The problem is also discussed here: https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ My understanding is that libnvc is more a runtime library that does not need to be included by the linker. Not sure if there is a specific reason to include libnvc (I am not so familiar what this library does). If I take out -lnvc from ?petscvariables?, then my program with openacc works as expected. I did not try any more realistic program that includes petsc. 2) When compiling petsc with cuda support, I also found that in the petsc library the library libnvJitLink.so.12 Is not found. On my system this library is in $CUDA_ROOT/lib64 I am not sure where this library is on your system ?! Thanks a lot, Frank Bramkamp -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Thu Apr 4 08:56:17 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Thu, 4 Apr 2024 08:56:17 -0500 (CDT) Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> Message-ID: <2c5b1a00-8839-5629-adc5-f6320bca0928@mcs.anl.gov> On Thu, 4 Apr 2024, Frank Bramkamp wrote: > Dear PETSC Team, > > I found the following problem: > I compile petsc 3.20.5 with Nvidia compiler 23.7. > > > I use a pretty standard configuration, including > > --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 --download-fblaslapack --with-cuda=0 > > I exclude cuda, since I was not sure if the problem was cuda related. Can you try using (to exclude cuda): --with-cudac=0 > > > The problem is now, if I have s simple fortran program where I link the petsc library, but I actually do not use petsc in that program > (Just for testing). I want to use OpenACC directives in my program, e.g. !$acc parallel loop . > The problem is now, as soon I link with the petsc library, the openacc commands do not work anymore. > It seems that openacc is not initialised and hence it cannot find a GPU. > > The problem seems that you link with -lnvc. > In ?petscvariables? => PETSC_WITH_EXTERNAL_LIB you include ?-lnvc?. > If I take this out, then openacc works. With ?-lnvc? something gets messed up. > > The problem is also discussed here: > https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ > > My understanding is that libnvc is more a runtime library that does not need to be included by the linker. > Not sure if there is a specific reason to include libnvc (I am not so familiar what this library does). > > If I take out -lnvc from ?petscvariables?, then my program with openacc works as expected. I did not try any more realistic program that includes petsc. > > > > 2) > When compiling petsc with cuda support, I also found that in the petsc library the library libnvJitLink.so.12 > Is not found. On my system this library is in $CUDA_ROOT/lib64 > I am not sure where this library is on your system ?! Hm - good if you can send configure.log for this. configure attempts '$CC -v' to determine the link libraries to get c/c++/fortran compatibility libraries. But it can grab other libraries that the compilers are using internally here. To avoid this - you can explicitly list these libraries to configure. For ex: for gcc/g++/gfortran ./configure CC=gcc CXX=g++ FC=gfortran LIBS="-lgfortran -lstdc++" Satish > > > Thanks a lot, Frank Bramkamp > > > > > > > > > > > > From bramkamp at nsc.liu.se Thu Apr 4 09:33:00 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Thu, 4 Apr 2024 16:33:00 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <2c5b1a00-8839-5629-adc5-f6320bca0928@mcs.anl.gov> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <2c5b1a00-8839-5629-adc5-f6320bca0928@mcs.anl.gov> Message-ID: <12756271-2D9F-4C32-9E71-D2EE7D2960DD@nsc.liu.se> Thanks for the reply, Do you know if you actively include the libnvc library ?! Or is this somehow automatically included ?! Greetings, Frank > On 4 Apr 2024, at 15:56, Satish Balay wrote: > > > On Thu, 4 Apr 2024, Frank Bramkamp wrote: > >> Dear PETSC Team, >> >> I found the following problem: >> I compile petsc 3.20.5 with Nvidia compiler 23.7. >> >> >> I use a pretty standard configuration, including >> >> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 --download-fblaslapack --with-cuda=0 >> >> I exclude cuda, since I was not sure if the problem was cuda related. > > Can you try using (to exclude cuda): --with-cudac=0 > >> >> >> The problem is now, if I have s simple fortran program where I link the petsc library, but I actually do not use petsc in that program >> (Just for testing). I want to use OpenACC directives in my program, e.g. !$acc parallel loop . >> The problem is now, as soon I link with the petsc library, the openacc commands do not work anymore. >> It seems that openacc is not initialised and hence it cannot find a GPU. >> >> The problem seems that you link with -lnvc. >> In ?petscvariables? => PETSC_WITH_EXTERNAL_LIB you include ?-lnvc?. >> If I take this out, then openacc works. With ?-lnvc? something gets messed up. >> >> The problem is also discussed here: >> https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ > >> >> My understanding is that libnvc is more a runtime library that does not need to be included by the linker. >> Not sure if there is a specific reason to include libnvc (I am not so familiar what this library does). >> >> If I take out -lnvc from ?petscvariables?, then my program with openacc works as expected. I did not try any more realistic program that includes petsc. >> >> >> >> 2) >> When compiling petsc with cuda support, I also found that in the petsc library the library libnvJitLink.so.12 >> Is not found. On my system this library is in $CUDA_ROOT/lib64 >> I am not sure where this library is on your system ?! > > Hm - good if you can send configure.log for this. configure attempts '$CC -v' to determine the link libraries to get c/c++/fortran compatibility libraries. But it can grab other libraries that the compilers are using internally here. > > To avoid this - you can explicitly list these libraries to configure. For ex: for gcc/g++/gfortran > > ./configure CC=gcc CXX=g++ FC=gfortran LIBS="-lgfortran -lstdc++" > > Satish > >> >> >> Thanks a lot, Frank Bramkamp -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Apr 4 09:46:58 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 4 Apr 2024 10:46:58 -0400 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <12756271-2D9F-4C32-9E71-D2EE7D2960DD@nsc.liu.se> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <2c5b1a00-8839-5629-adc5-f6320bca0928@mcs.anl.gov> <12756271-2D9F-4C32-9E71-D2EE7D2960DD@nsc.liu.se> Message-ID: Please send configure.log We do not explicitly include libnvc but as Satish noted it may get listed when configure is generating link lines. With configure.log we'll know where it is being included (and we may be able to provide a fix that removes it explicitly since it is apparently not needed according to the NVIDIA folks). Barry > On Apr 4, 2024, at 10:33?AM, Frank Bramkamp wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Thanks for the reply, > > Do you know if you actively include the libnvc library ?! > Or is this somehow automatically included ?! > > Greetings, Frank > > > > >> On 4 Apr 2024, at 15:56, Satish Balay > wrote: >> >> >> On Thu, 4 Apr 2024, Frank Bramkamp wrote: >> >>> Dear PETSC Team, >>> >>> I found the following problem: >>> I compile petsc 3.20.5 with Nvidia compiler 23.7. >>> >>> >>> I use a pretty standard configuration, including >>> >>> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 --download-fblaslapack --with-cuda=0 >>> >>> I exclude cuda, since I was not sure if the problem was cuda related. >> >> Can you try using (to exclude cuda): --with-cudac=0 >> >>> >>> >>> The problem is now, if I have s simple fortran program where I link the petsc library, but I actually do not use petsc in that program >>> (Just for testing). I want to use OpenACC directives in my program, e.g. !$acc parallel loop . >>> The problem is now, as soon I link with the petsc library, the openacc commands do not work anymore. >>> It seems that openacc is not initialised and hence it cannot find a GPU. >>> >>> The problem seems that you link with -lnvc. >>> In ?petscvariables? => PETSC_WITH_EXTERNAL_LIB you include ?-lnvc?. >>> If I take this out, then openacc works. With ?-lnvc? something gets messed up. >>> >>> The problem is also discussed here: >>> https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!dlXNyKBzSbximQ13OXxwO506OF71yRM_H5KEnarqXE75D6Vg-ePZr2u6SJ5V3YpRETatvb9pMOUVmpyN0-19SFlbug$ >>> >>> My understanding is that libnvc is more a runtime library that does not need to be included by the linker. >>> Not sure if there is a specific reason to include libnvc (I am not so familiar what this library does). >>> >>> If I take out -lnvc from ?petscvariables?, then my program with openacc works as expected. I did not try any more realistic program that includes petsc. >>> >>> >>> >>> 2) >>> When compiling petsc with cuda support, I also found that in the petsc library the library libnvJitLink.so.12 >>> Is not found. On my system this library is in $CUDA_ROOT/lib64 >>> I am not sure where this library is on your system ?! >> >> Hm - good if you can send configure.log for this. configure attempts '$CC -v' to determine the link libraries to get c/c++/fortran compatibility libraries. But it can grab other libraries that the compilers are using internally here. >> >> To avoid this - you can explicitly list these libraries to configure. For ex: for gcc/g++/gfortran >> >> ./configure CC=gcc CXX=g++ FC=gfortran LIBS="-lgfortran -lstdc++" >> >> Satish >> >>> >>> >>> Thanks a lot, Frank Bramkamp -------------- next part -------------- An HTML attachment was scrubbed... URL: From bramkamp at nsc.liu.se Thu Apr 4 09:57:52 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Thu, 4 Apr 2024 16:57:52 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <2c5b1a00-8839-5629-adc5-f6320bca0928@mcs.anl.gov> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <2c5b1a00-8839-5629-adc5-f6320bca0928@mcs.anl.gov> Message-ID: An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Apr 4 11:09:13 2024 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 4 Apr 2024 12:09:13 -0400 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> Message-ID: <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> Frank, Please try the PETSc git branch barry/2024-04-04/rm-lnvc-link-line/release This will hopefully resolve the -lnvc issue. Please let us know and we can add the fix to our current release. Barry > On Apr 4, 2024, at 9:37?AM, Frank Bramkamp wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Dear PETSC Team, > > I found the following problem: > I compile petsc 3.20.5 with Nvidia compiler 23.7. > > > I use a pretty standard configuration, including > > --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-log=1 --download-fblaslapack --with-cuda=0 > > I exclude cuda, since I was not sure if the problem was cuda related. > > > The problem is now, if I have s simple fortran program where I link the petsc library, but I actually do not use petsc in that program > (Just for testing). I want to use OpenACC directives in my program, e.g. !$acc parallel loop . > The problem is now, as soon I link with the petsc library, the openacc commands do not work anymore. > It seems that openacc is not initialised and hence it cannot find a GPU. > > The problem seems that you link with -lnvc. > In ?petscvariables? => PETSC_WITH_EXTERNAL_LIB you include ?-lnvc?. > If I take this out, then openacc works. With ?-lnvc? something gets messed up. > > The problem is also discussed here: > https://urldefense.us/v3/__https://forums.developer.nvidia.com/t/failed-cuda-device-detection-when-explicitly-linking-libnvc/203225/1__;!!G_uCfscf7eWS!Z2uhPVP0GUrttP3rh6nLk6BQsoI2EIfKfoLVXcwQFksSvtvvRILt4Yq0y-FFYmi3ugybPdn-te0Pw5mfExHSw7Y$ > > My understanding is that libnvc is more a runtime library that does not need to be included by the linker. > Not sure if there is a specific reason to include libnvc (I am not so familiar what this library does). > > If I take out -lnvc from ?petscvariables?, then my program with openacc works as expected. I did not try any more realistic program that includes petsc. > > > > 2) > When compiling petsc with cuda support, I also found that in the petsc library the library libnvJitLink.so .12 > Is not found. On my system this library is in $CUDA_ROOT/lib64 > I am not sure where this library is on your system ?! > > > Thanks a lot, Frank Bramkamp -------------- next part -------------- An HTML attachment was scrubbed... URL: From bramkamp at nsc.liu.se Thu Apr 4 11:16:10 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Thu, 4 Apr 2024 18:16:10 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> Message-ID: <7B75CDED-4F58-4E33-8BE5-FC3872A59FED@nsc.liu.se> An HTML attachment was scrubbed... URL: From yangzongze at gmail.com Thu Apr 4 11:53:05 2024 From: yangzongze at gmail.com (Zongze Yang) Date: Fri, 5 Apr 2024 00:53:05 +0800 Subject: [petsc-users] ex19: Segmentation Violation when run with MUMPS on MacOS (arm64) In-Reply-To: <5e8a0a57-ac39-f6bf-0fb6-bccfbdf18bd1@mcs.anl.gov> References: <046e4463-9255-bf20-0a80-3ecd65ea8717@mcs.anl.gov> <01546D85-33FB-49AF-9F92-B87A544E4F88@petsc.dev> <5cbc5a18-4305-22fc-c253-fb48a01100a7@mcs.anl.gov> <82abd3f7-3d6b-e226-355e-b7ef1e46f897@mcs.anl.gov> <52A2980F-F6AA-4666-B67F-0CDA4C8D52B1@gmail.com> <1769af69-4289-e824-d977-75f31be017a5@mcs.anl.gov> <4b225a01-e47f-a0f9-9681-8d0b981cf44e@mcs.anl.gov> <22EA03F7-81B1-4E72-87DB-C5E3DD10DF22@gmail.com> <28F897B4-BA64-4895-B4EF-485AFC8D6C10@gmail.com> <5e8a0a57-ac39-f6bf-0fb6-bccfbdf18bd1@mcs.anl.gov> Message-ID: <3A12FB77-26DD-4899-A6C9-75203DD2F7D3@gmail.com> An HTML attachment was scrubbed... URL: From bramkamp at nsc.liu.se Fri Apr 5 05:30:44 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Fri, 5 Apr 2024 12:30:44 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2039994 bytes Desc: not available URL: From balay at mcs.anl.gov Fri Apr 5 07:57:02 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 5 Apr 2024 07:57:02 -0500 (CDT) Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> Message-ID: <4f4e2881-7db7-1f51-8199-b110eb4a4116@mcs.anl.gov> >>> Executing: mpifort -o /tmp/petsc-nopi85m9/config.compilers/conftest -v -KPIC -O2 -g /tmp/petsc-nopi85m9/config.compilers/conftest.o stdout: Export NVCOMPILER=/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7 Export PGI=/software/sse2/tetralith_el9/manual/nvhpc/23.7 /software/sse2/generic/manual/ssetools/v1.9.5/wrappers/ld /usr/lib64/crt1.o /usr/lib64/crti.o /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/11//crtbegin.o /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib/f90main.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -T /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib/nvhpc.ld -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -L/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -L/softwa re/sse2/ tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/11/ /tmp/petsc-nopi85m9/config.compilers/conftest.o -rpath /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -rpath /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -o /tmp/petsc-nopi85m9/config.compilers/conftest -L/usr/lib/gcc/x86_64-redhat-linux/11//../../../../lib64 -lnvf -lnvomp -ldl --as-needed -lnvhpcatm -latomic --no-as-needed -lpthread -lnvcpumath -lnsnvc -lnvc -lrt -lpthread -lgcc -lc -lgcc_s -lm /usr/ lib/gcc/ x86_64-redhat-linux/11//crtend.o /usr/lib64/crtn.o compilers: Libraries needed to link Fortran code with the C linker: ['-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib', '-L/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64', '-Wl,-rpath,/softw are/sse2 /tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64', '-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/11', '-L/usr/lib/gcc/x86_64-redhat-linux/11', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib', '-lmpi_usempif08', '-lmpi_usempi_ignore_tkr', '-lmpi_mpifh', '-lmpi', '-Wl,-rpath,/so ftware/s se2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib', '-lnvf', '-lnvomp', '-ldl', '-lnvhpcatm', '-latomic', '-lpthread', '-lnvcpumath', '-lnsnvc', '-lnvc', '-lrt', '-lgcc_s', '-lm'] PETSC_WITH_EXTERNAL_LIB = -Wl,-rpath,/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_no_cuda/lib -L/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_no_cuda/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -L/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -L/software/sse2/tetral ith_el9/ manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/11 -L/usr/lib/gcc/x86_64-redhat-linux/11 -lpetsc -lflapack -lfblas -lX11 -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lnvf -lnvomp -ldl -lnvhpcatm -latomic -lpth read -ln vcpumath -lnsnvc -lnvc -lrt -lgcc_s -lm -lstdc++ -lquadmath <<< You'll probably want to skip lot more than just -lnvc Try the following and see if it works --with-cudac=0 LIBS="-lmpi_mpifh -lnvf -lstdc++" [or specify the correct list of libraries for the fortran MPI/Compiler libraries - with dependencies - as needed] Satish On Fri, 5 Apr 2024, Frank Bramkamp wrote: > Dear Barry, I tried your fix for -lnvc. Unfortunately it did not work so far. Here I send you the configure.?log file again. One can see that you try to skip something, but later it still always includes -lnvc for the linker. In the > file petscvariables > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > ? > ZjQcmQRYFpfptBannerEnd > > Dear Barry, > > I tried your fix for -lnvc. Unfortunately it did not work so far. > Here I send you the configure.log file again. > > One can see that you try to skip something, but later it still always includes -lnvc for the linker. > In the file petscvariables it also appears as before. > > As I see it, it lists the linker options including -lnvc also before you try to skip it. > Maybe it is already in the linker options before the skipping. > > > Greetings, Frank > > > > From balay at mcs.anl.gov Fri Apr 5 08:23:01 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 5 Apr 2024 08:23:01 -0500 (CDT) Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <4f4e2881-7db7-1f51-8199-b110eb4a4116@mcs.anl.gov> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <4f4e2881-7db7-1f51-8199-b110eb4a4116@mcs.anl.gov> Message-ID: <204e5034-35a1-e8df-9c2a-17f0f4f3451b@mcs.anl.gov> Or you can skip fortran - if you are not using PETSc from it [or any external package that requires it], but you would need cxx for cuda --with-fc=0 --download-f2cblaslapack --with-cxx=0 --with-cudac=0 or --with-fc=0 --download-f2cblaslapack --with-cudac=nvcc LIBS=-lstdc++ Satish On Fri, 5 Apr 2024, Satish Balay wrote: > >>> > Executing: mpifort -o /tmp/petsc-nopi85m9/config.compilers/conftest -v -KPIC -O2 -g /tmp/petsc-nopi85m9/config.compilers/conftest.o > stdout: > Export NVCOMPILER=/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7 > Export PGI=/software/sse2/tetralith_el9/manual/nvhpc/23.7 > /software/sse2/generic/manual/ssetools/v1.9.5/wrappers/ld /usr/lib64/crt1.o /usr/lib64/crti.o /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/11//crtbegin.o /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib/f90main.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -T /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib/nvhpc.ld -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -L/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -L/soft ware/sse 2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/11/ /tmp/petsc-nopi85m9/config.compilers/conftest.o -rpath /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -rpath /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -o /tmp/petsc-nopi85m9/config.compilers/conftest -L/usr/lib/gcc/x86_64-redhat-linux/11//../../../../lib64 -lnvf -lnvomp -ldl --as-needed -lnvhpcatm -latomic --no-as-needed -lpthread -lnvcpumath -lnsnvc -lnvc -lrt -lpthread -lgcc -lc -lgcc_s -lm /us r/lib/gc c/x86_64-redhat-linux/11//crtend.o /usr/lib64/crtn.o > > compilers: Libraries needed to link Fortran code with the C linker: ['-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib', '-L/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64', '-Wl,-rpath,/sof tware/ss e2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64', '-L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64', '-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/11', '-L/usr/lib/gcc/x86_64-redhat-linux/11', '-Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib', '-lmpi_usempif08', '-lmpi_usempi_ignore_tkr', '-lmpi_mpifh', '-lmpi', '-Wl,-rpath,/ software /sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23..7/compilers/lib', '-lnvf', '-lnvomp', '-ldl', '-lnvhpcatm', '-latomic', '-lpthread', '-lnvcpumath', '-lnsnvc', '-lnvc', '-lrt', '-lgcc_s', '-lm'] > > > PETSC_WITH_EXTERNAL_LIB = -Wl,-rpath,/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_no_cuda/lib -L/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_no_cuda/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -L/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -L/software/sse2/tetr alith_el 9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/extras/CUPTI/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/lib64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/11 -L/usr/lib/gcc/x86_64-redhat-linux/11 -lpetsc -lflapack -lfblas -lX11 -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lnvf -lnvomp -ldl -lnvhpcatm -latomic -lp thread - lnvcpumath -lnsnvc -lnvc -lrt -lgcc_s -lm -lstdc++ -lquadmath > <<< > > You'll probably want to skip lot more than just -lnvc > > Try the following and see if it works > > --with-cudac=0 LIBS="-lmpi_mpifh -lnvf -lstdc++" > > [or specify the correct list of libraries for the fortran MPI/Compiler libraries - with dependencies - as needed] > > Satish > > On Fri, 5 Apr 2024, Frank Bramkamp wrote: > > > Dear Barry, I tried your fix for -lnvc. Unfortunately it did not work so far. Here I send you the configure.?log file again. One can see that you try to skip something, but later it still always includes -lnvc for the linker. In the > > file petscvariables > > ZjQcmQRYFpfptBannerStart > > This Message Is From an External Sender > > This message came from outside your organization. > > ? > > ZjQcmQRYFpfptBannerEnd > > > > Dear Barry, > > > > I tried your fix for -lnvc. Unfortunately it did not work so far. > > Here I send you the configure.log file again. > > > > One can see that you try to skip something, but later it still always includes -lnvc for the linker. > > In the file petscvariables it also appears as before. > > > > As I see it, it lists the linker options including -lnvc also before you try to skip it. > > Maybe it is already in the linker options before the skipping. > > > > > > Greetings, Frank > > > > > > > > > From bramkamp at nsc.liu.se Fri Apr 5 08:27:01 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Fri, 5 Apr 2024 15:27:01 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <204e5034-35a1-e8df-9c2a-17f0f4f3451b@mcs.anl.gov> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <4f4e2881-7db7-1f51-8199-b110eb4a4116@mcs.anl.gov> <204e5034-35a1-e8df-9c2a-17f0f4f3451b@mcs.anl.gov> Message-ID: <32B6AB12-CD0F-4B95-832E-BE6247D3B91B@nsc.liu.se> An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Apr 5 08:56:52 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 5 Apr 2024 09:56:52 -0400 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> Message-ID: <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> An HTML attachment was scrubbed... URL: From bramkamp at nsc.liu.se Fri Apr 5 08:58:33 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Fri, 5 Apr 2024 15:58:33 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> Message-ID: <8984AEE1-7414-40AB-9AA9-3F795FDF0654@nsc.liu.se> An HTML attachment was scrubbed... URL: From bramkamp at nsc.liu.se Fri Apr 5 09:44:21 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Fri, 5 Apr 2024 16:44:21 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> Message-ID: An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Apr 5 11:49:15 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 5 Apr 2024 12:49:15 -0400 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> Message-ID: An HTML attachment was scrubbed... URL: From bramkamp at nsc.liu.se Fri Apr 5 11:58:49 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Fri, 5 Apr 2024 18:58:49 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> Message-ID: <8BFBC499-AD13-4340-B6AE-0392B37508E3@nsc.liu.se> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2054159 bytes Desc: not available URL: From bsmith at petsc.dev Fri Apr 5 12:47:59 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 5 Apr 2024 13:47:59 -0400 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <8BFBC499-AD13-4340-B6AE-0392B37508E3@nsc.liu.se> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> <8BFBC499-AD13-4340-B6AE-0392B37508E3@nsc.liu.se> Message-ID: <04BF8631-2C2E-4003-96C9-8432454181A9@petsc.dev> An HTML attachment was scrubbed... URL: From bramkamp at nsc.liu.se Fri Apr 5 13:14:50 2024 From: bramkamp at nsc.liu.se (Frank Bramkamp) Date: Fri, 5 Apr 2024 20:14:50 +0200 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: <04BF8631-2C2E-4003-96C9-8432454181A9@petsc.dev> References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> <8BFBC499-AD13-4340-B6AE-0392B37508E3@nsc.liu.se> <04BF8631-2C2E-4003-96C9-8432454181A9@petsc.dev> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2148514 bytes Desc: not available URL: From bsmith at petsc.dev Fri Apr 5 13:43:28 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 5 Apr 2024 14:43:28 -0400 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> <8BFBC499-AD13-4340-B6AE-0392B37508E3@nsc.liu.se> <04BF8631-2C2E-4003-96C9-8432454181A9@petsc.dev> Message-ID: <371A2F0B-D73F-4BC5-9364-80B3B99DEB42@petsc.dev> I see what you are talking about in the blas checks. However those checks "don't really matter" in that configure still succeeds. Do you have a problem later with libnvJitLink.so ? When you build PETSc (send make.log) or when you run tests, make check? Or when you try to run your code? Barry > On Apr 5, 2024, at 2:14?PM, Frank Bramkamp wrote: > > Hi Barry, > > Here I send you the configure.log file for the libnvJitLink problem. > > At the top of the configure.log file it seems to find libnvJitLink.so.12 > > But in the test for BlasLapack, it mentions > stdout: /tmp/petsc-h7tpd5_s/config.packages.BlasLapack/conftest: error while loading shared libraries: libnvJitLink.so.12: cannot open shared object file: No such file or directory > > > Is seems that for the BlasLapack test you include the ?stubs" directory. In ?stubs?, we only have libnvJitLink.so > but not libnvJitLink.so.12 The libnvJitLink.so.12 we have in a different directory (libs64), where is was also found before. > But maybe in the BlasLapack test, you only search for the libnvJitLink.so.12 in the stubs directory. > > Here I am not sure, if there should be a libnvJitLink.so.12 in stubs as well or not. > That means, it is not so clear if you should check in different directories, or if we should add another link from libnvJitLink.so in stubs to libnvJitLink.so.12. But if other people also do not have > libnvJitLink.so.12 in the stubs directory by default, that would be still a problem. But I also do not know if the stubs directory is the problem. > > > Thanks, Frank > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Fri Apr 5 14:42:40 2024 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Fri, 5 Apr 2024 19:42:40 +0000 Subject: [petsc-users] Compiling PETSc with strumpack in ORNL Frontier Message-ID: Hi all, we are trying to compile PETSc in Frontier using the structured matrix hierarchical solver strumpack, which uses GPU and might be a good candidate for our Poisson discretization. The list of libs I used for PETSc in this case is: $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3 --offload-arch=gfx90a" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hip-arch=gfx908 --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${CRAY_XPMEM_POST_LINK_OPTS} -lxpmem ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-superlu_dist --download-strumpack --download-metis --download-slate --download-magma --download-parmetis --download-ptscotch --download-zfp --download-butterflypack --with-openmp-dir=/opt/cray/pe/gcc/12.2.0/snos --download-scalapack --download-cmake --force I'm getting an error at configure time: ... Trying to download https://urldefense.us/v3/__https://github.com/liuyangzhuan/ButterflyPACK__;!!G_uCfscf7eWS!cW5KuKKMbmDa8n59SJGArXdSxVT_-V0qH3vt1-LE-CAr4wShO2pTXN3GvI0bVCwUh6RWH6z2URqBczHnVEyXXKAJ2LN7JnSj$ for BUTTERFLYPACK ============================================================================================= ============================================================================================= Configuring BUTTERFLYPACK with CMake; this may take several minutes ============================================================================================= ============================================================================================= Compiling and installing BUTTERFLYPACK; this may take several minutes ============================================================================================= ============================================================================================= Trying to download https://urldefense.us/v3/__https://github.com/pghysels/STRUMPACK__;!!G_uCfscf7eWS!cW5KuKKMbmDa8n59SJGArXdSxVT_-V0qH3vt1-LE-CAr4wShO2pTXN3GvI0bVCwUh6RWH6z2URqBczHnVEyXXKAJ2FeUr7dA$ for STRUMPACK ============================================================================================= ============================================================================================= Configuring STRUMPACK with CMake; this may take several minutes ============================================================================================= ============================================================================================= Compiling and installing STRUMPACK; this may take several minutes ============================================================================================= ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error running make on STRUMPACK ********************************************************************************************* Looking in the configure.log file I see error like this related to strumpack compilation: /opt/cray/pe/craype/2.7.19/bin/CC -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dstrumpack_EXPORTS -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build -isystem /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/include -isystem /opt/rocm-5.4.0/include -isystem /opt/rocm-5.4.0/hip/include -isystem /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/.. -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -fPIC -Wall -Wno-overloaded-virtual -fopenmp -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -MF CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o.d -o CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -c /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src/clustering/NeighborSearch.cpp gmake[2]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' gmake[1]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' stdout: g++: error: unrecognized command-line option '--offload-arch=gfx900' g++: error: unrecognized command-line option '--offload-arch=gfx900' g++: error: unrecognized command-line option '--offload-arch=gfx900' It seems the configure is picking up CC (g++) instead of hipcc as the compiler for strumpack. I wonder if anyone has come across this issue or has any suggestions? Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Apr 5 14:46:46 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 5 Apr 2024 15:46:46 -0400 Subject: [petsc-users] Compiling PETSc with strumpack in ORNL Frontier In-Reply-To: References: Message-ID: <3DDA8B98-C229-422B-AB02-AF4D10619BD1@petsc.dev> Please send the entire configure.log > On Apr 5, 2024, at 3:42?PM, Vanella, Marcos (Fed) via petsc-users wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hi all, we are trying to compile PETSc in Frontier using the structured matrix hierarchical solver strumpack, which uses GPU and might be a good candidate for our Poisson discretization. > The list of libs I used for PETSc in this case is: > > $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3 --offload-arch=gfx90a" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hip-arch=gfx908 --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${CRAY_XPMEM_POST_LINK_OPTS} -lxpmem ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-superlu_dist --download-strumpack --download-metis --download-slate --download-magma --download-parmetis --download-ptscotch --download-zfp --download-butterflypack --with-openmp-dir=/opt/cray/pe/gcc/12.2.0/snos --download-scalapack --download-cmake --force > > I'm getting an error at configure time: > > ... > Trying to download https://urldefense.us/v3/__https://github.com/liuyangzhuan/ButterflyPACK__;!!G_uCfscf7eWS!Zixr16YdQu3fiyHhdpuVPSpY2C6CE_eyJBpOizV54Ljkkw_4u9KcWP5QRT1Ukap5cNKYJ7t3If6OkGXrUyG8E-A$ for BUTTERFLYPACK > ============================================================================================= > ============================================================================================= > Configuring BUTTERFLYPACK with CMake; this may take several minutes > ============================================================================================= > ============================================================================================= > Compiling and installing BUTTERFLYPACK; this may take several minutes > ============================================================================================= > ============================================================================================= > Trying to download https://urldefense.us/v3/__https://github.com/pghysels/STRUMPACK__;!!G_uCfscf7eWS!Zixr16YdQu3fiyHhdpuVPSpY2C6CE_eyJBpOizV54Ljkkw_4u9KcWP5QRT1Ukap5cNKYJ7t3If6OkGXrA0zDldI$ for STRUMPACK > ============================================================================================= > ============================================================================================= > Configuring STRUMPACK with CMake; this may take several minutes > ============================================================================================= > ============================================================================================= > Compiling and installing STRUMPACK; this may take several minutes > ============================================================================================= > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error running make on STRUMPACK > ********************************************************************************************* > > Looking in the configure.log file I see error like this related to strumpack compilation: > > /opt/cray/pe/craype/2.7.19/bin/CC -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dstrumpack_EXPORTS -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build -isystem /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/include -isystem /opt/rocm-5.4.0/include -isystem /opt/rocm-5.4.0/hip/include -isystem /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/.. -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -fPIC -Wall -Wno-overloaded-virtual -fopenmp -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -MF CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o.d -o CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -c /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src/clustering/NeighborSearch.cpp > gmake[2]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' > gmake[1]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' > stdout: > g++: error: unrecognized command-line option '--offload-arch=gfx900' > g++: error: unrecognized command-line option '--offload-arch=gfx900' > g++: error: unrecognized command-line option '--offload-arch=gfx900' > > It seems the configure is picking up CC (g++) instead of hipcc as the compiler for strumpack. > > I wonder if anyone has come across this issue or has any suggestions? > Thanks! > Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Fri Apr 5 14:53:55 2024 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Fri, 5 Apr 2024 19:53:55 +0000 Subject: [petsc-users] Compiling PETSc with strumpack in ORNL Frontier In-Reply-To: <3DDA8B98-C229-422B-AB02-AF4D10619BD1@petsc.dev> References: <3DDA8B98-C229-422B-AB02-AF4D10619BD1@petsc.dev> Message-ID: Hi Barry, thank you. Attached is the compressed config.log. I'm using the following modules (gnu): module load PrgEnv-gnu module load cray-mpich module load amd-mixed/5.4.0 export MPICH_GPU_SUPPORT_ENABLED=1 Plus PETSC_DIR, PETSC_ARCH defined. Thank you for your time, Marcos ________________________________ From: Barry Smith Sent: Friday, April 5, 2024 3:46 PM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov ; Reuben D. Budiardja Subject: Re: [petsc-users] Compiling PETSc with strumpack in ORNL Frontier Please send the entire configure.log On Apr 5, 2024, at 3:42?PM, Vanella, Marcos (Fed) via petsc-users wrote: This Message Is From an External Sender This message came from outside your organization. Hi all, we are trying to compile PETSc in Frontier using the structured matrix hierarchical solver strumpack, which uses GPU and might be a good candidate for our Poisson discretization. The list of libs I used for PETSc in this case is: $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3 --offload-arch=gfx90a" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hip-arch=gfx908 --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${CRAY_XPMEM_POST_LINK_OPTS} -lxpmem ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-superlu_dist --download-strumpack --download-metis --download-slate --download-magma --download-parmetis --download-ptscotch --download-zfp --download-butterflypack --with-openmp-dir=/opt/cray/pe/gcc/12.2.0/snos --download-scalapack --download-cmake --force I'm getting an error at configure time: ... Trying to download https://urldefense.us/v3/__https://github.com/liuyangzhuan/ButterflyPACK__;!!G_uCfscf7eWS!ZsbcxsTeA5Gnj5SGJGmQ5agJcly9NSQOGKVARBNIlbC_FhaDeJ8zsJzk9Jt9fGJphrID6po6I8VKO0CIj2utu6d2Z7k7ky1W$ for BUTTERFLYPACK ============================================================================================= ============================================================================================= Configuring BUTTERFLYPACK with CMake; this may take several minutes ============================================================================================= ============================================================================================= Compiling and installing BUTTERFLYPACK; this may take several minutes ============================================================================================= ============================================================================================= Trying to download https://urldefense.us/v3/__https://github.com/pghysels/STRUMPACK__;!!G_uCfscf7eWS!ZsbcxsTeA5Gnj5SGJGmQ5agJcly9NSQOGKVARBNIlbC_FhaDeJ8zsJzk9Jt9fGJphrID6po6I8VKO0CIj2utu6d2Z7Uw2Qob$ for STRUMPACK ============================================================================================= ============================================================================================= Configuring STRUMPACK with CMake; this may take several minutes ============================================================================================= ============================================================================================= Compiling and installing STRUMPACK; this may take several minutes ============================================================================================= ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error running make on STRUMPACK ********************************************************************************************* Looking in the configure.log file I see error like this related to strumpack compilation: /opt/cray/pe/craype/2.7.19/bin/CC -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dstrumpack_EXPORTS -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build -isystem /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/include -isystem /opt/rocm-5.4.0/include -isystem /opt/rocm-5.4.0/hip/include -isystem /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/.. -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -fPIC -Wall -Wno-overloaded-virtual -fopenmp -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -MF CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o.d -o CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -c /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src/clustering/NeighborSearch.cpp gmake[2]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' gmake[1]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' stdout: g++: error: unrecognized command-line option '--offload-arch=gfx900' g++: error: unrecognized command-line option '--offload-arch=gfx900' g++: error: unrecognized command-line option '--offload-arch=gfx900' It seems the configure is picking up CC (g++) instead of hipcc as the compiler for strumpack. I wonder if anyone has come across this issue or has any suggestions? Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log.gz Type: application/x-gzip Size: 1141199 bytes Desc: configure.log.gz URL: From bsmith at petsc.dev Fri Apr 5 15:14:59 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 5 Apr 2024 16:14:59 -0400 Subject: [petsc-users] Problem with NVIDIA compiler and OpenACC In-Reply-To: References: <7E605389-2ADE-471E-B50F-384438764B9A@nsc.liu.se> <747C73E0-F85B-41B7-8202-A828AC95D3A9@petsc.dev> <68DC7E40-178C-4F22-82C5-63379ED2E516@petsc.dev> <8BFBC499-AD13-4340-B6AE-0392B37508E3@nsc.liu.se> <04BF8631-2C2E-4003-96C9-8432454181A9@petsc.dev> Message-ID: <6CD2CED2-3BC9-422E-B784-F59D43202DE6@petsc.dev> > On Apr 5, 2024, at 4:04?PM, Frank Bramkamp wrote: > > Hi Barry, > > The problem is that libnvJitlink.so.12 also appears in the libraries in petsclib.so (when I check with ldd). > But there, libnvJitlink.no .12 is stated as not found. And then I cannot execute my program, since this > Library is not found. We typically do not want to add LD_LIBRARY_PATH. Ok, it is a bit confusing. Contents /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64: ['libnppif.so.12', 'libcudart.so.12', 'libnppitc.so.12.1.1.14', 'libcuinj64.so.12.2', 'libnppc.so.12.1.1.14', 'libcuinj64.so.12.2.60', 'libnppim.so.12.1.1.14', 'libnppig.so', 'libnvjpeg.so.12.1.1.14', 'libnppidei.so.12.1.1.14', 'libnppidei.so.12', 'libnppial.so.12.1.1.14', 'libnppisu_static.a', 'libcudadevrt.a', 'libnppim.so', 'libnvToolsExt.so.1.0.0', 'libnpps.so.12.1.1.14', 'libaccinj64.so.12.2', 'libnppidei_static.a', 'libnvJitLink_static.a', 'libnppif.so', 'libnppial.so', 'libnvrtc-builtins.so', 'libcufilt.a', 'libnppig.so.12.1.1.14', 'libnvjpeg.so', 'libnvjpeg.so.12', 'libnppial_static.a', 'libnppicc_static.a', 'libnppidei.so', 'libnppist.so.12', 'cmake', 'libOpenCL.so.1.0', 'libnppisu.so', 'libnvrtc-builtins.so.12.2.91', 'libnpps.so', 'libnvrtc.so.12', 'libnppim_static.a', 'libcudart_static.a', 'libcuinj64.so', 'libnppicc.so.12.1.1.14', 'libOpenCL.so', 'libnvToolsExt.so', 'libnppig_static.a', 'libOpenCL.so.1.0.0', 'libnppc.so', 'libnppif_static.a', 'libaccinj64.so', 'libnvrtc_static.a', 'stubs', 'libnvJitLink.so.12.2.91', 'libnvrtc.so', 'libnvrtc-builtins_static.a', 'libnppig.so.12', 'libnppist.so', 'libnppif.so.12.1.1.14', 'libnppicc.so.12', 'libnppitc_static.a', 'libnppial.so.12', 'libnppim.so.12', 'libnppitc.so', 'libnpps.so.12', 'libnppist_static.a', 'libnpps_static.a', 'libnppist.so.12.1.1.14', 'libnppc.so.12', 'libcudart.so.12.2.53', 'libnvrtc-builtins.so.12.2', 'libnppisu.so.12.1.1.14', 'libculibos.a', 'libnvJitLink.so', 'libnvptxcompiler_static.a', 'libaccinj64.so.12.2.60', 'libnppitc.so.12', 'libnppisu.so.12', 'libOpenCL.so.1', 'libnvToolsExt.so.1', 'libnvrtc.so.12.2.91', 'libnppicc.so', 'libnppc_static.a', 'libcudart.so', 'libnvJitLink.so.12', 'libnvjpeg_static.a'] Executing: mpicc -o /tmp/petsc-h7tpd5_s/config.packages.BlasLapack/conftest -KPIC -O2 -g /tmp/petsc-h7tpd5_s/config.packages.BlasLapack/conftest.o -Wl,-rpath,/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_with_cuda_fixed_module/lib -L/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_with_cuda_fixed_module/lib -lflapack -Wl,-rpath,/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_with_cuda_fixed_module/lib -L/proj/nsc/users/bramkamp/petsc_install/petsc_barry_fix_nvclib_with_cuda_fixed_module/lib -lfblas -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64 -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64/stubs -lcuda -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -L/software/sse2/tetralith_el9/manual/FFTW/3.3.10/nv23.7/hpc1/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nvshmem/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/nccl/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/math_libs/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/extras/qd/lib -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/extras/CUPTI/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/extras/CUPTI/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/extras/CUPTI/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64 -L/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/11 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/11 -L/usr/lib/gcc/x86_64-redhat-linux/11 -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/comm_libs/mpi/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/compilers/lib -lnvf -lnvomp -ldl -lnvhpcatm -latomic -lpthread -lnvcpumath -lnsnvc -lrt -lgcc_s -lm -lquadmath We see that libnvJitLink.so.12 is in /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64 Then whenit links the executable (above) it passes -Wl,-rpath,/software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64 to the linker so that at run time, it should be able to find all the libraries in /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64. Is /software/sse2/tetralith_el9/manual/nvhpc/23.7/Linux_x86_64/23.7/cuda/12.2/lib64/libnvJitLink.so.12 a link to something that actually exists. > > At the moment I am not sure where it tries to find that library. Therefore I thought that maybe the problem > is that BlasLapack could put some path in the library, which does not exist. At the beginning of configure.log > it mentions libnvJitlink.so.12, but then it seems to get lost somewhere. > > I have to see again if there is already a problem when I make petsc check, or if it is just in my program later. > Not quite sure anymore. > > > I will write back next week, Frank > > > > > >> On 5 Apr 2024, at 19:47, Barry Smith > wrote: >> >> >> Thanks for the configure.log Send the configure.log for the failed nvJitlink problem. >> >> >>> On Apr 5, 2024, at 12:58?PM, Frank Bramkamp > wrote: >>> >>> Hi Barry, >>> >>> Here comes the latest configure.log file >>> >>> My cuda nvJitlink problem unfortunately still exists. >>> I will try it on a different cluster to see if this a specific problem of the actual nvhpc installation. >>> >>> >>> Have a nice weekend, Frank >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Tue Apr 9 06:38:51 2024 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Tue, 9 Apr 2024 13:38:51 +0200 Subject: [petsc-users] MatCreateTranspose Message-ID: Hi, I have a matrix A with transpose A' and would like to solve the linear system A'*x = b using the pcredistribute preconditioner. It seemed like a good idea to use MatCreateTranspose, but this leads to [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: No method getrow for Mat of type transpose [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bVYrcPfbXSXCcthSH37OClTgnf7RxkizULEW1ZZJ5yYG3-50366x_OFSPmOgGWEcOeFyOmAMSFewAyCnxIUg7K8W_neRPw$ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.21.0, unknown [0]PETSC ERROR: Configure options COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-fortran-bindings=0 FOPTFLAGS="-O3 -march=native" CUDAOPTFLAGS=-O3 --with-cuda --with-cusp --with-debugging=0 --download-scalapack --download-hdf5 --download-zlib --download-mumps --download-parmetis --download-metis --download-ptscotch --download-hypre --download-spai [0]PETSC ERROR: #1 MatGetRow() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/mat/interface/matrix.c:573 [0]PETSC ERROR: #2 PCSetUp_Redistribute() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/impls/redistribute/redistribute.c:111 [0]PETSC ERROR: #3 PCSetUp() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/interface/precon.c:1079 [0]PETSC ERROR: #4 KSPSetUp() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/ksp/interface/itfunc.c:415 MatTranspose is a working alternative, but MatCreateTranspose would be preferable. In principle the solution seems straightforward -- just add a getrow method -- but is it, and is it a good idea (performancewise etc)? Kind regards, Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Apr 9 08:59:49 2024 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Tue, 9 Apr 2024 13:59:49 +0000 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: Carl-Johan, You can use MatSolveTranspose() to solve A'*x = b. See petsc/src/ksp/ksp/tutorials/ex79.c `MatCreateTranspose()` is used if you only need a matrix that behaves like the transpose, but don't need the storage to be changed, i.e., A and A' share same matrix storage, thus MatGetRow() needs to get columns of A, which is not supported. Hong ________________________________ From: petsc-users on behalf of Carl-Johan Thore Sent: Tuesday, April 9, 2024 6:38 AM To: petsc-users Subject: [petsc-users] MatCreateTranspose Hi, I have a matrix A with transpose A' and would like to solve the linear system A'*x = b using the pcredistribute preconditioner. It seemed like a good idea to use MatCreateTranspose, but this leads to [0]PETSC ERROR: --------------------- ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi, I have a matrix A with transpose A' and would like to solve the linear system A'*x = b using the pcredistribute preconditioner. It seemed like a good idea to use MatCreateTranspose, but this leads to [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: No method getrow for Mat of type transpose [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZjJecuElTk9FpHCmWARWcvLbSCiWXeEshhB3Z61WIK2V0KX6gOTrfRK8nMlqoZ8Q7Q1y7I3VGWH_MP7gYz7pNt0l$ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.21.0, unknown [0]PETSC ERROR: Configure options COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-fortran-bindings=0 FOPTFLAGS="-O3 -march=native" CUDAOPTFLAGS=-O3 --with-cuda --with-cusp --with-debugging=0 --download-scalapack --download-hdf5 --download-zlib --download-mumps --download-parmetis --download-metis --download-ptscotch --download-hypre --download-spai [0]PETSC ERROR: #1 MatGetRow() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/mat/interface/matrix.c:573 [0]PETSC ERROR: #2 PCSetUp_Redistribute() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/impls/redistribute/redistribute.c:111 [0]PETSC ERROR: #3 PCSetUp() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/interface/precon.c:1079 [0]PETSC ERROR: #4 KSPSetUp() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/ksp/interface/itfunc.c:415 MatTranspose is a working alternative, but MatCreateTranspose would be preferable. In principle the solution seems straightforward -- just add a getrow method -- but is it, and is it a good idea (performancewise etc)? Kind regards, Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Tue Apr 9 09:19:40 2024 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Tue, 9 Apr 2024 16:19:40 +0200 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: Thanks for the suggestion. I don't have a factored matrix (and can't really use direct linear solvers) so MatSolveTranspose doesn't seem to be an option. I should have mentioned that I've also tried KSPSolveTranspose but that doesn't work with pcredistribute /Carl-Johan On Tue, Apr 9, 2024 at 3:59?PM Zhang, Hong wrote: > Carl-Johan, > You can use MatSolveTranspose() to solve A'*x = b. See > petsc/src/ksp/ksp/tutorials/ex79.c > > `MatCreateTranspose()` is used if you only need a matrix that behaves like > the transpose, but don't need the storage to be changed, i.e., A and A' > share same matrix storage, thus MatGetRow() needs to get columns of A, > which is not supported. > > Hong > ------------------------------ > *From:* petsc-users on behalf of > Carl-Johan Thore > *Sent:* Tuesday, April 9, 2024 6:38 AM > *To:* petsc-users > *Subject:* [petsc-users] MatCreateTranspose > > Hi, I have a matrix A with transpose A' and would like to solve the linear > system A'*x = b using the pcredistribute preconditioner. It seemed like a > good idea to use MatCreateTranspose, but this leads to [0]PETSC ERROR: > --------------------- > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Hi, > > I have a matrix A with transpose A' and would like to solve the linear > system A'*x = b using the pcredistribute preconditioner. It seemed like a > good idea to use MatCreateTranspose, but this leads to > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: No method getrow for Mat of type transpose > [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ep5IoJ3UZoQ62Wb4ElaMmi9G3Svqdr3ldXVnCRd-47InZQBv34SgL7WDdEFLYJDtFYzCdxXGf4WaQ9U5JwPTNHtq_gyDKw$ > > for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.21.0, unknown > [0]PETSC ERROR: Configure options COPTFLAGS="-O3 -march=native" > CXXOPTFLAGS="-O3 -march=native" --with-fortran-bindings=0 FOPTFLAGS="-O3 > -march=native" CUDAOPTFLAGS=-O3 --with-cuda --with-cusp --with-debugging=0 > --download-scalapack --download-hdf5 --download-zlib --download-mumps > --download-parmetis --download-metis --download-ptscotch --download-hypre > --download-spai > [0]PETSC ERROR: #1 MatGetRow() at > /mnt/c/mathware/petsc/petsc-v3-21-0/src/mat/interface/matrix.c:573 > [0]PETSC ERROR: #2 PCSetUp_Redistribute() at > /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/impls/redistribute/redistribute.c:111 > [0]PETSC ERROR: #3 PCSetUp() at > /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/interface/precon.c:1079 > [0]PETSC ERROR: #4 KSPSetUp() at > /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/ksp/interface/itfunc.c:415 > > MatTranspose is a working alternative, but MatCreateTranspose would be > preferable. In principle the solution seems straightforward -- just add a > getrow method -- but is it, and is it a good idea (performancewise etc)? > > Kind regards, > Carl-Johan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Tue Apr 9 10:30:54 2024 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 9 Apr 2024 17:30:54 +0200 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: > On 9 Apr 2024, at 4:19?PM, Carl-Johan Thore wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Thanks for the suggestion. I don't have a factored matrix (and can't really use direct linear solvers) so MatSolveTranspose doesn't seem to be an option. > I should have mentioned that I've also tried KSPSolveTranspose but that doesn't work with pcredistribute I?m not a frequent PCREDISTRIBUTE user, but it looks like https://urldefense.us/v3/__https://petsc.org/release/src/ksp/pc/impls/redistribute/redistribute.c.html*line332__;Iw!!G_uCfscf7eWS!bgGlN3dW5g5M98EkZPPaYivRkcXtawYE_jKsdMC0zK2bDy4u1Qy1KOMIcZ-_sBABRkHTzGlKHmAefb8Ozy3XbQ$ could be copy/paste?d into PCApplyTranspose_Redistribute() by just changing a MatMult() to MatMultTranspose() and KSPSolve() to KSPSolveTranspose(). Would you be willing to contribute (and test) this? Then, KSPSolveTranspose() ? which should be the function you call ? will work. Thanks, Pierre > /Carl-Johan > > On Tue, Apr 9, 2024 at 3:59?PM Zhang, Hong > wrote: >> Carl-Johan, >> You can use MatSolveTranspose() to solve A'*x = b. See petsc/src/ksp/ksp/tutorials/ex79.c >> >> `MatCreateTranspose()` is used if you only need a matrix that behaves like the transpose, but don't need the storage to be changed, i.e., A and A' share same matrix storage, thus MatGetRow() needs to get columns of A, which is not supported. >> >> Hong >> From: petsc-users > on behalf of Carl-Johan Thore > >> Sent: Tuesday, April 9, 2024 6:38 AM >> To: petsc-users > >> Subject: [petsc-users] MatCreateTranspose >> >> This Message Is From an External Sender >> This message came from outside your organization. >> >> Hi, >> >> I have a matrix A with transpose A' and would like to solve the linear system A'*x = b using the pcredistribute preconditioner. It seemed like a good idea to use MatCreateTranspose, but this leads to >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: No support for this operation for this object type >> [0]PETSC ERROR: No method getrow for Mat of type transpose >> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bgGlN3dW5g5M98EkZPPaYivRkcXtawYE_jKsdMC0zK2bDy4u1Qy1KOMIcZ-_sBABRkHTzGlKHmAefb8Z27cesg$ for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.21.0, unknown >> [0]PETSC ERROR: Configure options COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-fortran-bindings=0 FOPTFLAGS="-O3 -march=native" CUDAOPTFLAGS=-O3 --with-cuda --with-cusp --with-debugging=0 --download-scalapack --download-hdf5 --download-zlib --download-mumps --download-parmetis --download-metis --download-ptscotch --download-hypre --download-spai >> [0]PETSC ERROR: #1 MatGetRow() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/mat/interface/matrix.c:573 >> [0]PETSC ERROR: #2 PCSetUp_Redistribute() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/impls/redistribute/redistribute.c:111 >> [0]PETSC ERROR: #3 PCSetUp() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/pc/interface/precon.c:1079 >> [0]PETSC ERROR: #4 KSPSetUp() at /mnt/c/mathware/petsc/petsc-v3-21-0/src/ksp/ksp/interface/itfunc.c:415 >> >> MatTranspose is a working alternative, but MatCreateTranspose would be preferable. In principle the solution seems straightforward -- just add a getrow method -- but is it, and is it a good idea (performancewise etc)? >> >> Kind regards, >> Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Wed Apr 10 01:24:12 2024 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Wed, 10 Apr 2024 08:24:12 +0200 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: On Tue, Apr 9, 2024 at 5:31?PM Pierre Jolivet wrote: > > On 9 Apr 2024, at 4:19?PM, Carl-Johan Thore > wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Thanks for the suggestion. I don't have a factored matrix (and can't > really use direct linear solvers) so MatSolveTranspose doesn't seem to be > an option. > I should have mentioned that I've also tried KSPSolveTranspose but that > doesn't work with pcredistribute > > > I?m not a frequent PCREDISTRIBUTE user, but it looks like > https://urldefense.us/v3/__https://petsc.org/release/src/ksp/pc/impls/redistribute/redistribute.c.html*line332__;Iw!!G_uCfscf7eWS!cgAsxyEH_bcrB52QRg6oG5BdTF8bLQr9pb71JxdsoP7FOO0BCi8XGqL6D3oXT9GrnsuqWfA8MuLi8xxWvmzAKeSxmS6R2g$ could > be copy/paste?d into PCApplyTranspose_Redistribute() by just changing a > MatMult() to MatMultTranspose() and KSPSolve() to KSPSolveTranspose(). > Would you be willing to contribute (and test) this? > Then, KSPSolveTranspose() ? which should be the function you call ? will > work. > > Thanks, > Pierre > Thanks, that sounds promising. Yes, I'll try to make a contribution /Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.khurana22 at imperial.ac.uk Wed Apr 10 06:49:19 2024 From: p.khurana22 at imperial.ac.uk (Khurana, Parv) Date: Wed, 10 Apr 2024 11:49:19 +0000 Subject: [petsc-users] Exposing further detail in -log_view for Hypre withe PETSc Message-ID: Hello PETSc users community, Thank you in advance for your help as always. I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in my software (Nektar++). I am trying to understand the time profiling information that is printed using the -log_view option. I want to understand how much time is spent in the smoothening step vs the time to solve on the coarsest grid I reach. The output that I get from -log_view (pasted below) gives me information of the KSPSolve and MatMult, but I think I need more granular time information to see a further breakdown of time spent within my routines. I would like to hear if anyone has any recommendations on obtaining this information? Best Parv Khurana PETSc database options used for solve: -ksp_monitor # (source: file) -ksp_type preonly # (source: file) -log_view # (source: file) -pc_hypre_boomeramg_coarsen_type hmis # (source: file) -pc_hypre_boomeramg_grid_sweeps_all 2 # (source: file) -pc_hypre_boomeramg_interp_type ext+i # (source: file) -pc_hypre_boomeramg_max_iter 1 # (source: file) -pc_hypre_boomeramg_P_max 2 # (source: file) -pc_hypre_boomeramg_print_debug 1 # (source: file) -pc_hypre_boomeramg_print_statistics 1 # (source: file) -pc_hypre_boomeramg_relax_type_all sor/jacobi # (source: file) -pc_hypre_boomeramg_strong_threshold 0.7 # (source: file) -pc_hypre_boomeramg_truncfactor 0.3 # (source: file) -pc_hypre_type boomeramg # (source: file) -pc_type hypre # (source: file) PETSc log_view output: ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 1 1.0 9.6900e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 14 1.0 1.6315e-01 1.0 1.65e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 86 0 0 0 0 86 0 0 0 1011 MatConvert 1 1.0 4.3092e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 3 1.0 3.1680e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 3 1.0 9.4178e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 1.0 1.1630e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatSetPreallCOO 1 1.0 3.2132e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatSetValuesCOO 1 1.0 2.9956e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNorm 28 1.0 1.3981e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 1499 VecSet 13 1.0 6.5185e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAYPX 14 1.0 7.1511e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 733 VecAssemblyBegin 14 1.0 1.3998e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 14 1.0 4.2560e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 14 1.0 8.2761e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterEnd 14 1.0 4.4665e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 1 1.0 6.5993e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 1 1.0 7.9212e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 14 1.0 5.8690e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 14 1.0 4.3370e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 2.4910e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 14 1.0 2.1922e+00 1.0 1.91e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 100 0 0 0 0 100 0 0 0 87 PCSetUp 1 1.0 1.3165e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 14 1.0 1.9990e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.khurana22 at imperial.ac.uk Wed Apr 10 06:49:20 2024 From: p.khurana22 at imperial.ac.uk (Khurana, Parv) Date: Wed, 10 Apr 2024 11:49:20 +0000 Subject: [petsc-users] Hypre BoomerAMG settings options database In-Reply-To: References: <30697B53-8E8E-47D3-B3B7-15CF9B9F0D57@petsc.dev> Message-ID: Hello everyone, Thank you Mark for your suggestion - pc_hypre_boomeramg_grid_sweeps_all 2 indeed worked! Apologies for the very delayed response. Best Parv From: petsc-users On Behalf Of Mark Adams Sent: Sunday, January 7, 2024 12:58 AM To: Pierre Jolivet Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Hypre BoomerAMG settings options database This email from mfadams at lbl.gov originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list to disable email stamping for this address. I was thinking about interpreting the "BoomerAMG SOLVER PARAMETERS:" stuff (eg, what is the closest thing to SSOR that Parv wants). That does not look like our code, but maybe it is. On Sat, Jan 6, 2024 at 12:50?PM Pierre Jolivet > wrote: On 6 Jan 2024, at 3:15?PM, Mark Adams > wrote: Does this work for you? -pc_hypre_boomeramg_grid_sweeps_all 2 The comment in our code says SSOR is the default but it looks like it is really "hSGS" I thought it was an L1 Jacobi, but you would want to ask Hypre about this. HYPRE?s default settings are not the same as the ones we set in PETSc as default, so do not ask HYPRE people (about this particular issue). Thanks, Pierre Mark On Fri, Jan 5, 2024 at 10:21?AM Barry Smith > wrote: Yes, the handling of BoomerAMG options starts at line 365. If we don't support what you want but hypre has a function call that allows one to set the values then the option could easily be added to the PETSc options database here either by you (with a merge request) or us. So I would say check the hypre docs. Just let us know what BoomerAMG function is missing from the code. Barry On Jan 5, 2024, at 7:52?AM, Khurana, Parv > wrote: Hello PETSc users community, Happy new year! Thank you for the community support as always. I am using BoomerAMG for my research, and it is interfaced to my software via PETSc. I can only use options database keys as of now to tweak the settings I want for the AMG solve. I want to control the number of smoothener iterations at pre/post step for a given AMG cycle. I am looking for an options database key which helps me control this. I am not sure whether this is possible directly via the keys (Line 365: https://urldefense.us/v3/__https://www.mcs.anl.gov/petsc/petsc-3.5.4/src/ksp/pc/impls/hypre/hypre.c.html__;!!G_uCfscf7eWS!aTCg-75Y-gqRtqf04rTmBvnW_umETCNwzCol-A2kwOKVZfH42p-NgGyJh6rdI3VWczX1nGFLF-84NxCRe5kB4cziDcCyQl1c$ ). My comprehension of the current setup is that I have 1 smoothener iteration at every coarsening step. My aim is to do two pre and 2 post smoothening steps using the SSOR smoothener. BoomerAMG SOLVER PARAMETERS: Maximum number of cycles: 1 Stopping Tolerance: 0.000000e+00 Cycle type (1 = V, 2 = W, etc.): 1 Relaxation Parameters: Visiting Grid: down up coarse Number of sweeps: 1 1 1 Type 0=Jac, 3=hGS, 6=hSGS, 9=GE: 6 6 9 Point types, partial sweeps (1=C, -1=F): Pre-CG relaxation (down): 1 -1 Post-CG relaxation (up): -1 1 Coarsest grid: 0 PETSC settings I am using currently: -ksp_type preonly -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_coarsen_type hmis -pc_hypre_boomeramg_relax_type_all symmetric-sor/jacobi -pc_hypre_boomeramg_strong_threshold 0.7 -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_P_max 2 -pc_hypre_boomeramg_truncfactor 0.3 Thanks and Best Parv Khurana -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Apr 10 08:01:06 2024 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 10 Apr 2024 09:01:06 -0400 Subject: [petsc-users] Exposing further detail in -log_view for Hypre withe PETSc In-Reply-To: References: Message-ID: I believe there is an option to get hypre to print its performance data. Run with -help and grep on "pc_hypre" and look for something that looks like a logging or view parameter. Mark On Wed, Apr 10, 2024 at 7:49?AM Khurana, Parv wrote: > Hello PETSc users community, Thank you in advance for your help as always. > I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in my > software (Nektar++). I am trying to understand the time profiling > information that is printed > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Hello PETSc users community, > > > > Thank you in advance for your help as always. > > > > I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in > my software (Nektar++). I am trying to understand the time profiling > information that is printed using the -log_view option. > > > > I want to understand how much time is spent in the smoothening step vs the > time to solve on the coarsest grid I reach. The output that I get from > -log_view (pasted below) gives me information of the KSPSolve and MatMult, > but I think I need more granular time information to see a further > breakdown of time spent within my routines. I would like to hear if anyone > has any recommendations on obtaining this information? > > > > Best > > Parv Khurana > > > > > > PETSc database options used for solve: > > > > -ksp_monitor # (source: file) > > -ksp_type preonly # (source: file) > > -log_view # (source: file) > > -pc_hypre_boomeramg_coarsen_type hmis # (source: file) > > -pc_hypre_boomeramg_grid_sweeps_all 2 # (source: file) > > -pc_hypre_boomeramg_interp_type ext+i # (source: file) > > -pc_hypre_boomeramg_max_iter 1 # (source: file) > > -pc_hypre_boomeramg_P_max 2 # (source: file) > > -pc_hypre_boomeramg_print_debug 1 # (source: file) > > -pc_hypre_boomeramg_print_statistics 1 # (source: file) > > -pc_hypre_boomeramg_relax_type_all sor/jacobi # (source: file) > > -pc_hypre_boomeramg_strong_threshold 0.7 # (source: file) > > -pc_hypre_boomeramg_truncfactor 0.3 # (source: file) > > -pc_hypre_type boomeramg # (source: file) > > -pc_type hypre # (source: file) > > > > > > PETSc log_view output: > > > > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > > ------------------------------------------------------------------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > BuildTwoSided 1 1.0 9.6900e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatMult 14 1.0 1.6315e-01 1.0 1.65e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 86 0 0 0 0 86 0 0 0 1011 > > MatConvert 1 1.0 4.3092e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyBegin 3 1.0 3.1680e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 3 1.0 9.4178e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatGetRowIJ 2 1.0 1.1630e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatSetPreallCOO 1 1.0 3.2132e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatSetValuesCOO 1 1.0 2.9956e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecNorm 28 1.0 1.3981e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 11 0 0 0 0 11 0 0 0 1499 > > VecSet 13 1.0 6.5185e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAYPX 14 1.0 7.1511e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 733 > > VecAssemblyBegin 14 1.0 1.3998e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAssemblyEnd 14 1.0 4.2560e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 14 1.0 8.2761e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterEnd 14 1.0 4.4665e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFSetGraph 1 1.0 6.5993e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFSetUp 1 1.0 7.9212e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFPack 14 1.0 5.8690e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFUnpack 14 1.0 4.3370e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSetUp 1 1.0 2.4910e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 14 1.0 2.1922e+00 1.0 1.91e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 100 0 0 0 0 100 0 0 0 87 > > PCSetUp 1 1.0 1.3165e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 14 1.0 1.9990e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > ------------------------------------------------------------------------------------------------------------------------ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.khurana22 at imperial.ac.uk Wed Apr 10 08:09:53 2024 From: p.khurana22 at imperial.ac.uk (Khurana, Parv) Date: Wed, 10 Apr 2024 13:09:53 +0000 Subject: [petsc-users] Exposing further detail in -log_view for Hypre withe PETSc In-Reply-To: References: Message-ID: Hi Mark, I have had a look at these, and I could only find these ones that looked relevant: -pc_hypre_boomeramg_print_statistics: Print statistics (None) -pc_hypre_boomeramg_print_debug: Print debug information (None) -pc_hypre_boomeramg_print_statistics prints information about the V-cycle (number of levels, Dofs per level etc.) It also gave this information (which is something I can potentially work with?) Proc = 0 Level = 0 Coarsen Time = 0.022089 Proc = 0 Level = 0 Build Interp Time = 0.138844 Proc = 0 Level = 0 Build Coarse Operator Time = 0.354488 Proc = 0 Level = 1 Coarsen Time = 0.009334 Proc = 0 Level = 1 Build Interp Time = 0.074119 Proc = 0 Level = 1 Build Coarse Operator Time = 0.097702 Proc = 0 Level = 2 Coarsen Time = 0.004301 Proc = 0 Level = 2 Build Interp Time = 0.035835 Proc = 0 Level = 2 Build Coarse Operator Time = 0.030501 Proc = 0 Level = 3 Coarsen Time = 0.001876 Proc = 0 Level = 3 Build Interp Time = 0.014711 Proc = 0 Level = 3 Build Coarse Operator Time = 0.008681 Proc = 0 Level = 4 Coarsen Time = 0.000557 Proc = 0 Level = 4 Build Interp Time = 0.004307 Proc = 0 Level = 4 Build Coarse Operator Time = 0.002373 Proc = 0 Level = 5 Coarsen Time = 0.000268 Proc = 0 Level = 5 Build Interp Time = 0.001061 Proc = 0 Level = 5 Build Coarse Operator Time = 0.000589 Proc = 0 Level = 6 Coarsen Time = 0.000149 Proc = 0 Level = 6 Build Interp Time = 0.000339 Proc = 0 Level = 6 Build Coarse Operator Time = 0.000206 Proc = 0 Level = 7 Coarsen Time = 0.000090 Proc = 0 Level = 7 Build Interp Time = 0.000148 Proc = 0 Level = 7 Build Coarse Operator Time = 0.000085 Proc = 0 Level = 8 Coarsen Time = 0.000054 Proc = 0 Level = 8 Build Interp Time = 0.000100 Proc = 0 Level = 8 Build Coarse Operator Time = 0.000053 I have not tried -pc_hypre_boomeramg_print_debug yet. Think I can get the total coarsen time by summing up the time from all the levels here. I am still trying to understand how to get the time spent to solve the problem at the coarsest level. Best Parv From: Mark Adams Sent: Wednesday, April 10, 2024 2:01 PM To: Khurana, Parv Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Exposing further detail in -log_view for Hypre withe PETSc This email from mfadams at lbl.gov originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list to disable email stamping for this address. I believe there is an option to get hypre to print its performance data. Run with -help and grep on "pc_hypre" and look for something that looks like a logging or view parameter. Mark On Wed, Apr 10, 2024 at 7:49?AM Khurana, Parv > wrote: Hello PETSc users community, Thank you in advance for your help as always. I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in my software (Nektar++). I am trying to understand the time profiling information that is printed ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hello PETSc users community, Thank you in advance for your help as always. I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in my software (Nektar++). I am trying to understand the time profiling information that is printed using the -log_view option. I want to understand how much time is spent in the smoothening step vs the time to solve on the coarsest grid I reach. The output that I get from -log_view (pasted below) gives me information of the KSPSolve and MatMult, but I think I need more granular time information to see a further breakdown of time spent within my routines. I would like to hear if anyone has any recommendations on obtaining this information? Best Parv Khurana PETSc database options used for solve: -ksp_monitor # (source: file) -ksp_type preonly # (source: file) -log_view # (source: file) -pc_hypre_boomeramg_coarsen_type hmis # (source: file) -pc_hypre_boomeramg_grid_sweeps_all 2 # (source: file) -pc_hypre_boomeramg_interp_type ext+i # (source: file) -pc_hypre_boomeramg_max_iter 1 # (source: file) -pc_hypre_boomeramg_P_max 2 # (source: file) -pc_hypre_boomeramg_print_debug 1 # (source: file) -pc_hypre_boomeramg_print_statistics 1 # (source: file) -pc_hypre_boomeramg_relax_type_all sor/jacobi # (source: file) -pc_hypre_boomeramg_strong_threshold 0.7 # (source: file) -pc_hypre_boomeramg_truncfactor 0.3 # (source: file) -pc_hypre_type boomeramg # (source: file) -pc_type hypre # (source: file) PETSc log_view output: ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 1 1.0 9.6900e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 14 1.0 1.6315e-01 1.0 1.65e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 86 0 0 0 0 86 0 0 0 1011 MatConvert 1 1.0 4.3092e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 3 1.0 3.1680e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 3 1.0 9.4178e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 1.0 1.1630e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatSetPreallCOO 1 1.0 3.2132e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatSetValuesCOO 1 1.0 2.9956e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNorm 28 1.0 1.3981e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 1499 VecSet 13 1.0 6.5185e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAYPX 14 1.0 7.1511e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 733 VecAssemblyBegin 14 1.0 1.3998e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 14 1.0 4.2560e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 14 1.0 8.2761e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterEnd 14 1.0 4.4665e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 1 1.0 6.5993e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 1 1.0 7.9212e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFPack 14 1.0 5.8690e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 14 1.0 4.3370e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 1 1.0 2.4910e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 14 1.0 2.1922e+00 1.0 1.91e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 100 0 0 0 0 100 0 0 0 87 PCSetUp 1 1.0 1.3165e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 14 1.0 1.9990e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Apr 10 08:21:21 2024 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 10 Apr 2024 09:21:21 -0400 Subject: [petsc-users] Exposing further detail in -log_view for Hypre withe PETSc In-Reply-To: References: Message-ID: Ask hypre how to do this and we can figure out how to get it in PETSc. Mark On Wed, Apr 10, 2024 at 9:09?AM Khurana, Parv wrote: > Hi Mark, > > > > I have had a look at these, and I could only find these ones that looked > relevant: > > > > -pc_hypre_boomeramg_print_statistics: Print statistics (None) > > -pc_hypre_boomeramg_print_debug: Print debug information (None) > > > > -pc_hypre_boomeramg_print_statistics prints information about the V-cycle > (number of levels, Dofs per level etc.) It also gave this information > (which is something I can potentially work with?) > > > > Proc = 0 Level = 0 Coarsen Time = 0.022089 > > Proc = 0 Level = 0 Build Interp Time = 0.138844 > > Proc = 0 Level = 0 Build Coarse Operator Time = 0.354488 > > Proc = 0 Level = 1 Coarsen Time = 0.009334 > > Proc = 0 Level = 1 Build Interp Time = 0.074119 > > Proc = 0 Level = 1 Build Coarse Operator Time = 0.097702 > > Proc = 0 Level = 2 Coarsen Time = 0.004301 > > Proc = 0 Level = 2 Build Interp Time = 0.035835 > > Proc = 0 Level = 2 Build Coarse Operator Time = 0.030501 > > Proc = 0 Level = 3 Coarsen Time = 0.001876 > > Proc = 0 Level = 3 Build Interp Time = 0.014711 > > Proc = 0 Level = 3 Build Coarse Operator Time = 0.008681 > > Proc = 0 Level = 4 Coarsen Time = 0.000557 > > Proc = 0 Level = 4 Build Interp Time = 0.004307 > > Proc = 0 Level = 4 Build Coarse Operator Time = 0.002373 > > Proc = 0 Level = 5 Coarsen Time = 0.000268 > > Proc = 0 Level = 5 Build Interp Time = 0.001061 > > Proc = 0 Level = 5 Build Coarse Operator Time = 0.000589 > > Proc = 0 Level = 6 Coarsen Time = 0.000149 > > Proc = 0 Level = 6 Build Interp Time = 0.000339 > > Proc = 0 Level = 6 Build Coarse Operator Time = 0.000206 > > Proc = 0 Level = 7 Coarsen Time = 0.000090 > > Proc = 0 Level = 7 Build Interp Time = 0.000148 > > Proc = 0 Level = 7 Build Coarse Operator Time = 0.000085 > > Proc = 0 Level = 8 Coarsen Time = 0.000054 > > Proc = 0 Level = 8 Build Interp Time = 0.000100 > > Proc = 0 Level = 8 Build Coarse Operator Time = 0.000053 > > > > I have not tried -pc_hypre_boomeramg_print_debug yet. > > Think I can get the total coarsen time by summing up the time from all the > levels here. I am still trying to understand how to get the time spent to > solve the problem at the coarsest level. > > > > Best > > Parv > > > > *From:* Mark Adams > *Sent:* Wednesday, April 10, 2024 2:01 PM > *To:* Khurana, Parv > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Exposing further detail in -log_view for > Hypre withe PETSc > > > > This email from mfadams at lbl.gov originates from outside Imperial. Do not > click on links and attachments unless you recognise the sender. If you > trust the sender, add them to your safe senders list > to disable email > stamping for this address. > > > > I believe there is an option to get hypre to print its performance data. > > Run with -help and grep on "pc_hypre" and look for something that looks > like a logging or view parameter. > > > > Mark > > > > On Wed, Apr 10, 2024 at 7:49?AM Khurana, Parv > wrote: > > Hello PETSc users community, Thank you in advance for your help as always. > I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in my > software (Nektar++). I am trying to understand the time profiling > information that is printed > > ZjQcmQRYFpfptBannerStart > > *This Message Is From an External Sender * > > This message came from outside your organization. > > > > ZjQcmQRYFpfptBannerEnd > > Hello PETSc users community, > > > > Thank you in advance for your help as always. > > > > I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in > my software (Nektar++). I am trying to understand the time profiling > information that is printed using the -log_view option. > > > > I want to understand how much time is spent in the smoothening step vs the > time to solve on the coarsest grid I reach. The output that I get from > -log_view (pasted below) gives me information of the KSPSolve and MatMult, > but I think I need more granular time information to see a further > breakdown of time spent within my routines. I would like to hear if anyone > has any recommendations on obtaining this information? > > > > Best > > Parv Khurana > > > > > > PETSc database options used for solve: > > > > -ksp_monitor # (source: file) > > -ksp_type preonly # (source: file) > > -log_view # (source: file) > > -pc_hypre_boomeramg_coarsen_type hmis # (source: file) > > -pc_hypre_boomeramg_grid_sweeps_all 2 # (source: file) > > -pc_hypre_boomeramg_interp_type ext+i # (source: file) > > -pc_hypre_boomeramg_max_iter 1 # (source: file) > > -pc_hypre_boomeramg_P_max 2 # (source: file) > > -pc_hypre_boomeramg_print_debug 1 # (source: file) > > -pc_hypre_boomeramg_print_statistics 1 # (source: file) > > -pc_hypre_boomeramg_relax_type_all sor/jacobi # (source: file) > > -pc_hypre_boomeramg_strong_threshold 0.7 # (source: file) > > -pc_hypre_boomeramg_truncfactor 0.3 # (source: file) > > -pc_hypre_type boomeramg # (source: file) > > -pc_type hypre # (source: file) > > > > > > PETSc log_view output: > > > > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > > ------------------------------------------------------------------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > BuildTwoSided 1 1.0 9.6900e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatMult 14 1.0 1.6315e-01 1.0 1.65e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 86 0 0 0 0 86 0 0 0 1011 > > MatConvert 1 1.0 4.3092e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyBegin 3 1.0 3.1680e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 3 1.0 9.4178e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatGetRowIJ 2 1.0 1.1630e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatSetPreallCOO 1 1.0 3.2132e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatSetValuesCOO 1 1.0 2.9956e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecNorm 28 1.0 1.3981e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 11 0 0 0 0 11 0 0 0 1499 > > VecSet 13 1.0 6.5185e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAYPX 14 1.0 7.1511e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 733 > > VecAssemblyBegin 14 1.0 1.3998e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAssemblyEnd 14 1.0 4.2560e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 14 1.0 8.2761e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterEnd 14 1.0 4.4665e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFSetGraph 1 1.0 6.5993e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFSetUp 1 1.0 7.9212e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFPack 14 1.0 5.8690e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFUnpack 14 1.0 4.3370e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSetUp 1 1.0 2.4910e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 14 1.0 2.1922e+00 1.0 1.91e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 100 0 0 0 0 100 0 0 0 87 > > PCSetUp 1 1.0 1.3165e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 14 1.0 1.9990e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > ------------------------------------------------------------------------------------------------------------------------ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 10 09:29:23 2024 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 10 Apr 2024 10:29:23 -0400 Subject: [petsc-users] Exposing further detail in -log_view for Hypre withe PETSc In-Reply-To: References: Message-ID: On Wed, Apr 10, 2024 at 7:49?AM Khurana, Parv wrote: > Hello PETSc users community, Thank you in advance for your help as always. > I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in my > software (Nektar++). I am trying to understand the time profiling > information that is printed > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Hello PETSc users community, > > > > Thank you in advance for your help as always. > > > > I am using BoomerAMG from Hypre via PETSc as a part of preconditioner in > my software (Nektar++). I am trying to understand the time profiling > information that is printed using the -log_view option. > I believe that our Hypre interface plugs into the PETSc MG callbacks. Thus you should be able to use -pc_mg_log to split the various levels into different logging stages. This still will not directly log Hypre time, but it should be better. Thanks, MAtt > I want to understand how much time is spent in the smoothening step vs the > time to solve on the coarsest grid I reach. The output that I get from > -log_view (pasted below) gives me information of the KSPSolve and MatMult, > but I think I need more granular time information to see a further > breakdown of time spent within my routines. I would like to hear if anyone > has any recommendations on obtaining this information? > > > > Best > > Parv Khurana > > > > > > PETSc database options used for solve: > > > > -ksp_monitor # (source: file) > > -ksp_type preonly # (source: file) > > -log_view # (source: file) > > -pc_hypre_boomeramg_coarsen_type hmis # (source: file) > > -pc_hypre_boomeramg_grid_sweeps_all 2 # (source: file) > > -pc_hypre_boomeramg_interp_type ext+i # (source: file) > > -pc_hypre_boomeramg_max_iter 1 # (source: file) > > -pc_hypre_boomeramg_P_max 2 # (source: file) > > -pc_hypre_boomeramg_print_debug 1 # (source: file) > > -pc_hypre_boomeramg_print_statistics 1 # (source: file) > > -pc_hypre_boomeramg_relax_type_all sor/jacobi # (source: file) > > -pc_hypre_boomeramg_strong_threshold 0.7 # (source: file) > > -pc_hypre_boomeramg_truncfactor 0.3 # (source: file) > > -pc_hypre_type boomeramg # (source: file) > > -pc_type hypre # (source: file) > > > > > > PETSc log_view output: > > > > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > > ------------------------------------------------------------------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > BuildTwoSided 1 1.0 9.6900e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatMult 14 1.0 1.6315e-01 1.0 1.65e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 86 0 0 0 0 86 0 0 0 1011 > > MatConvert 1 1.0 4.3092e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyBegin 3 1.0 3.1680e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 3 1.0 9.4178e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatGetRowIJ 2 1.0 1.1630e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatSetPreallCOO 1 1.0 3.2132e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatSetValuesCOO 1 1.0 2.9956e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecNorm 28 1.0 1.3981e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 11 0 0 0 0 11 0 0 0 1499 > > VecSet 13 1.0 6.5185e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAYPX 14 1.0 7.1511e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 733 > > VecAssemblyBegin 14 1.0 1.3998e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAssemblyEnd 14 1.0 4.2560e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterBegin 14 1.0 8.2761e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecScatterEnd 14 1.0 4.4665e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFSetGraph 1 1.0 6.5993e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFSetUp 1 1.0 7.9212e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFPack 14 1.0 5.8690e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > SFUnpack 14 1.0 4.3370e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSetUp 1 1.0 2.4910e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 14 1.0 2.1922e+00 1.0 1.91e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 100 0 0 0 0 100 0 0 0 87 > > PCSetUp 1 1.0 1.3165e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 14 1.0 1.9990e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > ------------------------------------------------------------------------------------------------------------------------ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dcXXAGiIt9SePznRQ1Iqc5whnzG7XtxblsB2LTCG5oT4ueID0jjr9B111lWItGe3riTGdRgPQsKbWr4aesSN$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Fri Apr 12 04:10:28 2024 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Fri, 12 Apr 2024 11:10:28 +0200 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: Pierre, I see that you've already done a merge request for this. Thanks! I have tested this and it works nicely in my application /Carl-Johan On Wed, Apr 10, 2024 at 8:24?AM Carl-Johan Thore wrote: > > > On Tue, Apr 9, 2024 at 5:31?PM Pierre Jolivet wrote: > >> >> On 9 Apr 2024, at 4:19?PM, Carl-Johan Thore >> wrote: >> >> This Message Is From an External Sender >> This message came from outside your organization. >> Thanks for the suggestion. I don't have a factored matrix (and can't >> really use direct linear solvers) so MatSolveTranspose doesn't seem to >> be an option. >> I should have mentioned that I've also tried KSPSolveTranspose but that >> doesn't work with pcredistribute >> >> >> I?m not a frequent PCREDISTRIBUTE user, but it looks like >> https://urldefense.us/v3/__https://petsc.org/release/src/ksp/pc/impls/redistribute/redistribute.c.html*line332__;Iw!!G_uCfscf7eWS!d3xUyKqZovzqGxg2_Ek4FStdTXsvnabp6mvY0kSRCCRZXZ6y6pd8MUnEi9sZR8QjNGiwWLxXy3N08YLxKwIcWBQ9Jbc-Qg$ could >> be copy/paste?d into PCApplyTranspose_Redistribute() by just changing a >> MatMult() to MatMultTranspose() and KSPSolve() to KSPSolveTranspose(). >> Would you be willing to contribute (and test) this? >> Then, KSPSolveTranspose() ? which should be the function you call ? will >> work. >> >> Thanks, >> Pierre >> > > Thanks, that sounds promising. Yes, I'll try to make a contribution > /Carl-Johan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Fri Apr 12 04:16:18 2024 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 12 Apr 2024 11:16:18 +0200 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: > On 12 Apr 2024, at 11:10?AM, Carl-Johan Thore wrote: > > Pierre, I see that you've already done a merge request for this. Thanks! > I have tested this and it works nicely in my application I guess your matrix is symmetric in pattern? Because otherwise, I don?t think this should work. But if it?s OK for your use case, I could simply add a PetscCheck() that the input Mat is symmetric and then get this integrated (better to have something partially working than nothing at all, I guess). Please let me know. Thanks, Pierre > /Carl-Johan > > On Wed, Apr 10, 2024 at 8:24?AM Carl-Johan Thore > wrote: >> >> >> On Tue, Apr 9, 2024 at 5:31?PM Pierre Jolivet > wrote: >>> >>>> On 9 Apr 2024, at 4:19?PM, Carl-Johan Thore > wrote: >>>> >>>> This Message Is From an External Sender >>>> This message came from outside your organization. >>>> Thanks for the suggestion. I don't have a factored matrix (and can't really use direct linear solvers) so MatSolveTranspose doesn't seem to be an option. >>>> I should have mentioned that I've also tried KSPSolveTranspose but that doesn't work with pcredistribute >>> >>> I?m not a frequent PCREDISTRIBUTE user, but it looks like https://urldefense.us/v3/__https://petsc.org/release/src/ksp/pc/impls/redistribute/redistribute.c.html*line332__;Iw!!G_uCfscf7eWS!dzQtfplyy0liDIzLvwZPEQ15gmxeoXxZBqfgkYyHOUdkUhP-wvoKWG58yEPisaaRzqydtExDOol1d4MSyysHYQ$ could be copy/paste?d into PCApplyTranspose_Redistribute() by just changing a MatMult() to MatMultTranspose() and KSPSolve() to KSPSolveTranspose(). >>> Would you be willing to contribute (and test) this? >>> Then, KSPSolveTranspose() ? which should be the function you call ? will work. >>> >>> Thanks, >>> Pierre >> >> Thanks, that sounds promising. Yes, I'll try to make a contribution >> /Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Fri Apr 12 04:51:53 2024 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Fri, 12 Apr 2024 11:51:53 +0200 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: On Fri, Apr 12, 2024 at 11:16?AM Pierre Jolivet wrote: > > > On 12 Apr 2024, at 11:10?AM, Carl-Johan Thore > wrote: > > Pierre, I see that you've already done a merge request for this. Thanks! > I have tested this and it works nicely in my application > > > I guess your matrix is symmetric in pattern? > Because otherwise, I don?t think this should work. > But if it?s OK for your use case, I could simply add a PetscCheck() that > the input Mat is symmetric and then get this integrated (better to have > something partially working than nothing at all, I guess). > Please let me know. > > Thanks, > Pierre > Yes, unless I'm mistaken it's structurally symmetric. Would be great to have this integrated. (I don't now what you have in mind for the check, but MatIsStructurallySymmetric did not work for me) /Carl-Johan > /Carl-Johan > > On Wed, Apr 10, 2024 at 8:24?AM Carl-Johan Thore > wrote: > >> >> >> On Tue, Apr 9, 2024 at 5:31?PM Pierre Jolivet wrote: >> >>> >>> On 9 Apr 2024, at 4:19?PM, Carl-Johan Thore >>> wrote: >>> >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> Thanks for the suggestion. I don't have a factored matrix (and can't >>> really use direct linear solvers) so MatSolveTranspose doesn't seem to >>> be an option. >>> I should have mentioned that I've also tried KSPSolveTranspose but that >>> doesn't work with pcredistribute >>> >>> >>> I?m not a frequent PCREDISTRIBUTE user, but it looks like >>> https://urldefense.us/v3/__https://petsc.org/release/src/ksp/pc/impls/redistribute/redistribute.c.html*line332__;Iw!!G_uCfscf7eWS!eKgr2PYF7SG39RXX3djw6sMcERzrkuIDlriMyaTiUBJLndZZjvPWDIExlBIgGGHuniBRgDKoQ2YiNQFa0Rh356QrCz6dDw$ could >>> be copy/paste?d into PCApplyTranspose_Redistribute() by just changing a >>> MatMult() to MatMultTranspose() and KSPSolve() to KSPSolveTranspose(). >>> Would you be willing to contribute (and test) this? >>> Then, KSPSolveTranspose() ? which should be the function you call ? will >>> work. >>> >>> Thanks, >>> Pierre >>> >> >> Thanks, that sounds promising. Yes, I'll try to make a contribution >> /Carl-Johan >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Fri Apr 12 05:22:29 2024 From: pierre at joliv.et (Pierre Jolivet) Date: Fri, 12 Apr 2024 12:22:29 +0200 Subject: [petsc-users] MatCreateTranspose In-Reply-To: References: Message-ID: <25468593-FFE2-4927-9A1D-D8044DF7270F@joliv.et> > On 12 Apr 2024, at 11:51?AM, Carl-Johan Thore wrote: > > On Fri, Apr 12, 2024 at 11:16?AM Pierre Jolivet > wrote: >> >> >>> On 12 Apr 2024, at 11:10?AM, Carl-Johan Thore > wrote: >>> >>> Pierre, I see that you've already done a merge request for this. Thanks! >>> I have tested this and it works nicely in my application >> >> I guess your matrix is symmetric in pattern? >> Because otherwise, I don?t think this should work. >> But if it?s OK for your use case, I could simply add a PetscCheck() that the input Mat is symmetric and then get this integrated (better to have something partially working than nothing at all, I guess). >> Please let me know. >> >> Thanks, >> Pierre > > Yes, unless I'm mistaken it's structurally symmetric. Would be great to have this integrated. OK, I?ll work on finalizing this. > (I don't now what you have in mind for the check, but MatIsStructurallySymmetric did not work for me) You?ll have to call MatSetOption(A, MAT_STRUCTURALLY_SYMMETRIC, PETSC_TRUE) before KSPSolveTranspose(). Thanks, Pierre > /Carl-Johan > > > >>> /Carl-Johan >>> >>> On Wed, Apr 10, 2024 at 8:24?AM Carl-Johan Thore > wrote: >>>> >>>> >>>> On Tue, Apr 9, 2024 at 5:31?PM Pierre Jolivet > wrote: >>>>> >>>>>> On 9 Apr 2024, at 4:19?PM, Carl-Johan Thore > wrote: >>>>>> >>>>>> This Message Is From an External Sender >>>>>> This message came from outside your organization. >>>>>> Thanks for the suggestion. I don't have a factored matrix (and can't really use direct linear solvers) so MatSolveTranspose doesn't seem to be an option. >>>>>> I should have mentioned that I've also tried KSPSolveTranspose but that doesn't work with pcredistribute >>>>> >>>>> I?m not a frequent PCREDISTRIBUTE user, but it looks like https://urldefense.us/v3/__https://petsc.org/release/src/ksp/pc/impls/redistribute/redistribute.c.html*line332__;Iw!!G_uCfscf7eWS!Z3r5Vel5sbPcmhUVW14FZ6PsYuoOjHTQALmGl5fdDTHAI1ylX6YQfiOTutFQCeLQewR7-dZQ2k-7sSa95-LFDg$ could be copy/paste?d into PCApplyTranspose_Redistribute() by just changing a MatMult() to MatMultTranspose() and KSPSolve() to KSPSolveTranspose(). >>>>> Would you be willing to contribute (and test) this? >>>>> Then, KSPSolveTranspose() ? which should be the function you call ? will work. >>>>> >>>>> Thanks, >>>>> Pierre >>>> >>>> Thanks, that sounds promising. Yes, I'll try to make a contribution >>>> /Carl-Johan >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cho at slac.stanford.edu Fri Apr 12 12:48:38 2024 From: cho at slac.stanford.edu (Ng, Cho-Kuen) Date: Fri, 12 Apr 2024 17:48:38 +0000 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> Message-ID: I performed tests on comparison using KSP with and without cuda backend on NERSC's Perlmutter. For a finite element solve with 800k degrees of freedom, the best times obtained using MPI and MPI+GPU were o MPI - 128 MPI tasks, 27 s o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s Is that the performance one would expect using the hybrid mode of computation. Attached image shows the scaling on a single node. Thanks, Cho ________________________________ From: Ng, Cho-Kuen Sent: Saturday, August 12, 2023 8:08 AM To: Jacob Faibussowitsch Cc: Barry Smith ; petsc-users Subject: Re: [petsc-users] Using PETSc GPU backend Thanks Jacob. ________________________________ From: Jacob Faibussowitsch Sent: Saturday, August 12, 2023 5:02 AM To: Ng, Cho-Kuen Cc: Barry Smith ; petsc-users Subject: Re: [petsc-users] Using PETSc GPU backend > Can petsc show the number of GPUs used? -device_view Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users wrote: > > Barry, > > I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: > > KSP Object: 32 MPI processes > > Can petsc show the number of GPUs used? > > Thanks, > Cho > > From: Barry Smith > Sent: Wednesday, August 9, 2023 4:09 PM > To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend > > We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen wrote: >> >> Barry and Matt, >> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? >> >> Best, >> Cho >> >> From: Barry Smith >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen >>>> Cc: Barry Smith ; Mark Adams ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith ; Mark Adams >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ayzvIKJwKmRG8pwu08_ikMDkk-2RTSFLjetpNY5u1zyOv8c0CVVizWOIcNzX27RfVhPixM8dbsF7cAlbrNTNyxdZ$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ayzvIKJwKmRG8pwu08_ikMDkk-2RTSFLjetpNY5u1zyOv8c0CVVizWOIcNzX27RfVhPixM8dbsF7cAlbrNTNyxdZ$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ayzvIKJwKmRG8pwu08_ikMDkk-2RTSFLjetpNY5u1zyOv8c0CVVizWOIcNzX27RfVhPixM8dbsF7cAlbrNTNyxdZ$ >> >> >> >> From: Barry Smith >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen >>>> Cc: Barry Smith ; Mark Adams ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith ; Mark Adams >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ayzvIKJwKmRG8pwu08_ikMDkk-2RTSFLjetpNY5u1zyOv8c0CVVizWOIcNzX27RfVhPixM8dbsF7cAlbrNTNyxdZ$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ayzvIKJwKmRG8pwu08_ikMDkk-2RTSFLjetpNY5u1zyOv8c0CVVizWOIcNzX27RfVhPixM8dbsF7cAlbrNTNyxdZ$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ayzvIKJwKmRG8pwu08_ikMDkk-2RTSFLjetpNY5u1zyOv8c0CVVizWOIcNzX27RfVhPixM8dbsF7cAlbrNTNyxdZ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 37922 bytes Desc: Untitled.png URL: From bsmith at petsc.dev Fri Apr 12 14:53:45 2024 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 12 Apr 2024 15:53:45 -0400 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> Message-ID: <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> 800k is a pretty small problem for GPUs. We would need to see the runs with output from -ksp_view -log_view to see if the timing results are reasonable. > On Apr 12, 2024, at 1:48?PM, Ng, Cho-Kuen wrote: > > I performed tests on comparison using KSP with and without cuda backend on NERSC's Perlmutter. For a finite element solve with 800k degrees of freedom, the best times obtained using MPI and MPI+GPU were > > o MPI - 128 MPI tasks, 27 s > > o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s > > Is that the performance one would expect using the hybrid mode of computation. Attached image shows the scaling on a single node. > > Thanks, > Cho > From: Ng, Cho-Kuen > > Sent: Saturday, August 12, 2023 8:08 AM > To: Jacob Faibussowitsch > > Cc: Barry Smith >; petsc-users > > Subject: Re: [petsc-users] Using PETSc GPU backend > > Thanks Jacob. > From: Jacob Faibussowitsch > > Sent: Saturday, August 12, 2023 5:02 AM > To: Ng, Cho-Kuen > > Cc: Barry Smith >; petsc-users > > Subject: Re: [petsc-users] Using PETSc GPU backend > > > Can petsc show the number of GPUs used? > > -device_view > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users > wrote: > > > > Barry, > > > > I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: > > > > KSP Object: 32 MPI processes > > > > Can petsc show the number of GPUs used? > > > > Thanks, > > Cho > > > > From: Barry Smith > > > Sent: Wednesday, August 9, 2023 4:09 PM > > To: Ng, Cho-Kuen > > > Cc: petsc-users at mcs.anl.gov > > > Subject: Re: [petsc-users] Using PETSc GPU backend > > > > We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > > > > > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen > wrote: > >> > >> Barry and Matt, > >> > >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? > >> > >> Best, > >> Cho > >> > >> From: Barry Smith > > >> Sent: Monday, July 17, 2023 6:58 AM > >> To: Ng, Cho-Kuen > > >> Cc: petsc-users at mcs.anl.gov > > >> Subject: Re: [petsc-users] Using PETSc GPU backend > >> > >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > >> > >> > >> > >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: > >>> > >>> Barry, > >>> > >>> Thank you so much for the clarification. > >>> > >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? > >>> > >>> Cho > >>> From: Barry Smith > > >>> Sent: Saturday, July 15, 2023 8:36 AM > >>> To: Ng, Cho-Kuen > > >>> Cc: petsc-users at mcs.anl.gov > > >>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>> > >>> Cho, > >>> > >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. > >>> > >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. > >>> > >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda > >>> > >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda > >>> > >>> Sorry for the confusion. > >>> > >>> Barry > >>> > >>> > >>> > >>> > >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: > >>>> > >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: > >>>> Matt, > >>>> > >>>> After inserting 2 lines in the code: > >>>> > >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); > >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> "There are no unused options." However, there is no improvement on the GPU performance. > >>>> > >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. > >>>> > >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). > >>>> > >>>> THanks, > >>>> > >>>> Matt > >>>> Thanks, > >>>> Cho > >>>> > >>>> From: Matthew Knepley > > >>>> Sent: Friday, July 14, 2023 5:57 PM > >>>> To: Ng, Cho-Kuen > > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: > >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. > >>>> > >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>> > >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. > >>>> > >>>> o #PETSc Option Table entries: > >>>> ???-log_view > >>>> ???-mat_type aijcusparse > >>>> -options_left > >>>> -vec_type cuda > >>>> #End of PETSc Option Table entries > >>>> WARNING! There are options you set that were not used! > >>>> WARNING! could be spelling mistake, etc! > >>>> There is one unused database option. It is: > >>>> Option left: name:-mat_type value: aijcusparse > >>>> > >>>> The -mat_type option has not been used. In the application code, we use > >>>> > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> > >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. > >>>> > >>>> Thanks, > >>>> > >>>> Matt > >>>> o The percent flops on the GPU for KSPSolve is 17%. > >>>> > >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? > >>>> > >>>> Thanks, > >>>> Cho > >>>> From: Ng, Cho-Kuen > > >>>> Sent: Friday, June 30, 2023 7:57 AM > >>>> To: Barry Smith >; Mark Adams > > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> Barry, Mark and Matt, > >>>> > >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. > >>>> > >>>> Cho > >>>> From: Barry Smith > > >>>> Sent: Friday, June 30, 2023 7:01 AM > >>>> To: Mark Adams > > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> > >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. > >>>> > >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: > >>>>> > >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: > >>>>> > >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> > >>>>> Mark > >>>>> > >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: > >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: > >>>>> Mark, > >>>>> > >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? > >>>>> > >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt > >>>>> Cho > >>>>> From: Ng, Cho-Kuen > > >>>>> Sent: Thursday, June 29, 2023 8:32 PM > >>>>> To: Mark Adams > > >>>>> Cc: petsc-users at mcs.anl.gov > > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Mark, > >>>>> > >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? > >>>>> > >>>>> Cho > >>>>> From: Mark Adams > > >>>>> Sent: Thursday, June 29, 2023 5:55 PM > >>>>> To: Ng, Cho-Kuen > > >>>>> Cc: petsc-users at mcs.anl.gov > > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. > >>>>> > >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. > >>>>> > >>>>> Mark > >>>>> > >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: > >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? > >>>>> > >>>>> Thanks, > >>>>> Cho > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>>> -- Norbert Wiener > >>>>> > >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$ > >>>> > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$ > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$ > >> > >> > >> > >> From: Barry Smith > > >> Sent: Monday, July 17, 2023 6:58 AM > >> To: Ng, Cho-Kuen > > >> Cc: petsc-users at mcs.anl.gov > > >> Subject: Re: [petsc-users] Using PETSc GPU backend > >> > >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > >> > >> > >> > >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: > >>> > >>> Barry, > >>> > >>> Thank you so much for the clarification. > >>> > >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? > >>> > >>> Cho > >>> From: Barry Smith > > >>> Sent: Saturday, July 15, 2023 8:36 AM > >>> To: Ng, Cho-Kuen > > >>> Cc: petsc-users at mcs.anl.gov > > >>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>> > >>> Cho, > >>> > >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. > >>> > >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. > >>> > >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda > >>> > >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda > >>> > >>> Sorry for the confusion. > >>> > >>> Barry > >>> > >>> > >>> > >>> > >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: > >>>> > >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: > >>>> Matt, > >>>> > >>>> After inserting 2 lines in the code: > >>>> > >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); > >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> "There are no unused options." However, there is no improvement on the GPU performance. > >>>> > >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. > >>>> > >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). > >>>> > >>>> THanks, > >>>> > >>>> Matt > >>>> Thanks, > >>>> Cho > >>>> > >>>> From: Matthew Knepley > > >>>> Sent: Friday, July 14, 2023 5:57 PM > >>>> To: Ng, Cho-Kuen > > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: > >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. > >>>> > >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>> > >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. > >>>> > >>>> o #PETSc Option Table entries: > >>>> ???-log_view > >>>> ???-mat_type aijcusparse > >>>> -options_left > >>>> -vec_type cuda > >>>> #End of PETSc Option Table entries > >>>> WARNING! There are options you set that were not used! > >>>> WARNING! could be spelling mistake, etc! > >>>> There is one unused database option. It is: > >>>> Option left: name:-mat_type value: aijcusparse > >>>> > >>>> The -mat_type option has not been used. In the application code, we use > >>>> > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> > >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. > >>>> > >>>> Thanks, > >>>> > >>>> Matt > >>>> o The percent flops on the GPU for KSPSolve is 17%. > >>>> > >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? > >>>> > >>>> Thanks, > >>>> Cho > >>>> From: Ng, Cho-Kuen > > >>>> Sent: Friday, June 30, 2023 7:57 AM > >>>> To: Barry Smith >; Mark Adams > > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> Barry, Mark and Matt, > >>>> > >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. > >>>> > >>>> Cho > >>>> From: Barry Smith > > >>>> Sent: Friday, June 30, 2023 7:01 AM > >>>> To: Mark Adams > > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> > >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. > >>>> > >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: > >>>>> > >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: > >>>>> > >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> > >>>>> Mark > >>>>> > >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: > >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: > >>>>> Mark, > >>>>> > >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? > >>>>> > >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt > >>>>> Cho > >>>>> From: Ng, Cho-Kuen > > >>>>> Sent: Thursday, June 29, 2023 8:32 PM > >>>>> To: Mark Adams > > >>>>> Cc: petsc-users at mcs.anl.gov > > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Mark, > >>>>> > >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? > >>>>> > >>>>> Cho > >>>>> From: Mark Adams > > >>>>> Sent: Thursday, June 29, 2023 5:55 PM > >>>>> To: Ng, Cho-Kuen > > >>>>> Cc: petsc-users at mcs.anl.gov > > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. > >>>>> > >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. > >>>>> > >>>>> Mark > >>>>> > >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: > >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? > >>>>> > >>>>> Thanks, > >>>>> Cho > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>>> -- Norbert Wiener > >>>>> > >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$ > >>>> > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$ > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cho at slac.stanford.edu Fri Apr 12 16:50:57 2024 From: cho at slac.stanford.edu (Ng, Cho-Kuen) Date: Fri, 12 Apr 2024 21:50:57 +0000 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> Message-ID: Barry, Attached is the petsc output. Thanks, Cho ________________________________ From: Barry Smith Sent: Friday, April 12, 2024 12:53 PM To: Ng, Cho-Kuen Cc: Jacob Faibussowitsch ; petsc-users Subject: Re: [petsc-users] Using PETSc GPU backend 800k is a pretty small problem for GPUs. We would need to see the runs with output from -ksp_view -log_view to see if the timing results are reasonable. On Apr 12, 2024, at 1:48?PM, Ng, Cho-Kuen wrote: I performed tests on comparison using KSP with and without cuda backend on NERSC's Perlmutter. For a finite element solve with 800k degrees of freedom, the best times obtained using MPI and MPI+GPU were o MPI - 128 MPI tasks, 27 s o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s Is that the performance one would expect using the hybrid mode of computation. Attached image shows the scaling on a single node. Thanks, Cho ________________________________ From: Ng, Cho-Kuen > Sent: Saturday, August 12, 2023 8:08 AM To: Jacob Faibussowitsch > Cc: Barry Smith >; petsc-users > Subject: Re: [petsc-users] Using PETSc GPU backend Thanks Jacob. ________________________________ From: Jacob Faibussowitsch > Sent: Saturday, August 12, 2023 5:02 AM To: Ng, Cho-Kuen > Cc: Barry Smith >; petsc-users > Subject: Re: [petsc-users] Using PETSc GPU backend > Can petsc show the number of GPUs used? -device_view Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users > wrote: > > Barry, > > I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: > > KSP Object: 32 MPI processes > > Can petsc show the number of GPUs used? > > Thanks, > Cho > > From: Barry Smith > > Sent: Wednesday, August 9, 2023 4:09 PM > To: Ng, Cho-Kuen > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Using PETSc GPU backend > > We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen > wrote: >> >> Barry and Matt, >> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? >> >> Best, >> Cho >> >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith >; Mark Adams > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YE7hkMHodNudwc4WIx6fGQwDLfOhkfw6oF0OR3A4Heb0mDpFmmblUrBJ4hx0GgbaR_dumzjj6dtzlAU9wIhLfoVE$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YE7hkMHodNudwc4WIx6fGQwDLfOhkfw6oF0OR3A4Heb0mDpFmmblUrBJ4hx0GgbaR_dumzjj6dtzlAU9wIhLfoVE$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YE7hkMHodNudwc4WIx6fGQwDLfOhkfw6oF0OR3A4Heb0mDpFmmblUrBJ4hx0GgbaR_dumzjj6dtzlAU9wIhLfoVE$ >> >> >> >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith >; Mark Adams > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YE7hkMHodNudwc4WIx6fGQwDLfOhkfw6oF0OR3A4Heb0mDpFmmblUrBJ4hx0GgbaR_dumzjj6dtzlAU9wIhLfoVE$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YE7hkMHodNudwc4WIx6fGQwDLfOhkfw6oF0OR3A4Heb0mDpFmmblUrBJ4hx0GgbaR_dumzjj6dtzlAU9wIhLfoVE$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YE7hkMHodNudwc4WIx6fGQwDLfOhkfw6oF0OR3A4Heb0mDpFmmblUrBJ4hx0GgbaR_dumzjj6dtzlAU9wIhLfoVE$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc.out Type: application/octet-stream Size: 232702 bytes Desc: petsc.out URL: From mfadams at lbl.gov Fri Apr 12 18:52:25 2024 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 12 Apr 2024 19:52:25 -0400 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> Message-ID: As Barry said, this is a bit small but the performance looks reasonable. The solver does very badly, mathematically. I would try hypre to get another data point. You could also try 'cg' to check that the pipelined version is not a problem. Mark On Fri, Apr 12, 2024 at 3:54?PM Barry Smith wrote: > 800k is a pretty small problem for GPUs. We would need to see the runs > with output from -ksp_view -log_view to see if the timing results are > reasonable. On Apr 12, 2024, at 1: 48 PM, Ng, Cho-Kuen stanford. edu> wrote: I performed > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > 800k is a pretty small problem for GPUs. > > We would need to see the runs with output from -ksp_view -log_view to > see if the timing results are reasonable. > > On Apr 12, 2024, at 1:48?PM, Ng, Cho-Kuen wrote: > > I performed tests on comparison using KSP with and without cuda backend on > NERSC's Perlmutter. For a finite element solve with 800k degrees of > freedom, the best times obtained using MPI and MPI+GPU were > > o MPI - 128 MPI tasks, 27 s > > o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s > > Is that the performance one would expect using the hybrid mode of > computation. Attached image shows the scaling on a single node. > > Thanks, > Cho > ------------------------------ > *From:* Ng, Cho-Kuen > *Sent:* Saturday, August 12, 2023 8:08 AM > *To:* Jacob Faibussowitsch > *Cc:* Barry Smith ; petsc-users > > *Subject:* Re: [petsc-users] Using PETSc GPU backend > > Thanks Jacob. > ------------------------------ > *From:* Jacob Faibussowitsch > *Sent:* Saturday, August 12, 2023 5:02 AM > *To:* Ng, Cho-Kuen > *Cc:* Barry Smith ; petsc-users > > *Subject:* Re: [petsc-users] Using PETSc GPU backend > > > Can petsc show the number of GPUs used? > > -device_view > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Barry, > > > > I tried again today on Perlmutter and running on multiple GPU nodes > worked. Likely, I had messed up something the other day. Also, I was able > to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output > shows the number of MPI tasks: > > > > KSP Object: 32 MPI processes > > > > Can petsc show the number of GPUs used? > > > > Thanks, > > Cho > > > > From: Barry Smith > > Sent: Wednesday, August 9, 2023 4:09 PM > > To: Ng, Cho-Kuen > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Using PETSc GPU backend > > > > We would need more information about "hanging". Do PETSc examples and > tiny problems "hang" on multiple nodes? If you run with -info what are the > last messages printed? Can you run with a debugger to see where it is > "hanging"? > > > > > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen wrote: > >> > >> Barry and Matt, > >> > >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 > node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple > nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I > fix this? > >> > >> Best, > >> Cho > >> > >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM > >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend > >> > >> The examples that use DM, in particular DMDA all trivially support > using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > >> > >> > >> > >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: > >>> > >>> Barry, > >>> > >>> Thank you so much for the clarification. > >>> > >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are > there other tutorials available? > >>> > >>> Cho > >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM > >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>> > >>> Cho, > >>> > >>> We currently have a crappy API for turning on GPU support, and our > documentation is misleading in places. > >>> > >>> People constantly say "to use GPU's with PETSc you only need to > use -mat_type aijcusparse (for example)" This is incorrect. > >>> > >>> This does not work with code that uses the convenience Mat > constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only > works if you use the constructor approach of MatCreate(), MatSetSizes(), > MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to > use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda > >>> > >>> If you use DM to create the matrices and vectors then you can use > -dm_mat_type aijcusparse -dm_vec_type cuda > >>> > >>> Sorry for the confusion. > >>> > >>> Barry > >>> > >>> > >>> > >>> > >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: > >>>> > >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: > >>>> Matt, > >>>> > >>>> After inserting 2 lines in the code: > >>>> > >>>> ierr = > MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); > >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> > d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> "There are no unused options." However, there is no improvement on > the GPU performance. > >>>> > >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat > you created in steps 1 and 2. This is detailed in the manual. > >>>> > >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before > MatSetFromOptions(). > >>>> > >>>> THanks, > >>>> > >>>> Matt > >>>> Thanks, > >>>> Cho > >>>> > >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM > >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith ; Mark Adams ; > petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: > >>>> I managed to pass the following options to PETSc using a GPU node on > Perlmutter. > >>>> > >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>> > >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. > >>>> > >>>> o #PETSc Option Table entries: > >>>> -log_view > >>>> -mat_type aijcusparse > >>>> -options_left > >>>> -vec_type cuda > >>>> #End of PETSc Option Table entries > >>>> WARNING! There are options you set that were not used! > >>>> WARNING! could be spelling mistake, etc! > >>>> There is one unused database option. It is: > >>>> Option left: name:-mat_type value: aijcusparse > >>>> > >>>> The -mat_type option has not been used. In the application code, we > use > >>>> > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> > >>>> If you create the Mat this way, then you need MatSetFromOptions() in > order to set the type from the command line. > >>>> > >>>> Thanks, > >>>> > >>>> Matt > >>>> o The percent flops on the GPU for KSPSolve is 17%. > >>>> > >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an > order of magnitude slower. How can I improve the GPU performance? > >>>> > >>>> Thanks, > >>>> Cho > >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM > >>>> To: Barry Smith ; Mark Adams > >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> Barry, Mark and Matt, > >>>> > >>>> Thank you all for the suggestions. I will modify the code so we can > pass runtime options. > >>>> > >>>> Cho > >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM > >>>> To: Mark Adams > >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen < > cho at slac.stanford.edu>; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> > >>>> Note that options like -mat_type aijcusparse -vec_type cuda only > work if the program is set up to allow runtime swapping of matrix and > vector types. If you have a call to MatCreateMPIAIJ() or other specific > types then then these options do nothing but because Mark had you use > -options_left the program will tell you at the end that it did not use the > option so you will know. > >>>> > >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: > >>>>> > >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the > args and you run: > >>>>> > >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> > >>>>> Mark > >>>>> > >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: > >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> Mark, > >>>>> > >>>>> The application code reads in parameters from an input file, where > we can put the PETSc runtime options. Then we pass the options to > PetscInitialize(...). Does that sounds right? > >>>>> > >>>>> PETSc will read command line argument automatically in > PetscInitialize() unless you shut it off. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt > >>>>> Cho > >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM > >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Mark, > >>>>> > >>>>> Thanks for the information. How do I put the runtime options for the > executable, say, a.out, which does not have the provision to append > arguments? Do I need to change the C++ main to read in the options? > >>>>> > >>>>> Cho > >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM > >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view > -options_left > >>>>> The last column of the performance data (from -log_view) will be the > percent flops on the GPU. Check that that is > 0. > >>>>> > >>>>> The end of the output will list the options that were used and > options that were _not_ used (if any). Check that there are no options left. > >>>>> > >>>>> Mark > >>>>> > >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> I installed PETSc on Perlmutter using "spack install > petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I > compiled the application code (purely CPU code) linking to the petsc > package, hoping that I can get performance improvement using the petsc GPU > backend. However, the timing was the same using the same number of MPI > tasks with and without GPU accelerators. Have I missed something in the > process, for example, setting up PETSc options at runtime to use the GPU > backend? > >>>>> > >>>>> Thanks, > >>>>> Cho > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>>> -- Norbert Wiener > >>>>> > >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YNQ2XFZlhyUlipu4sdOKV4traakP3Yhdqce8KX11c4MZKm3JThQeNGFVD4i2Zk6R3nw_eSkR6JMwKAAuEYglCXg$ > > >>>> > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YNQ2XFZlhyUlipu4sdOKV4traakP3Yhdqce8KX11c4MZKm3JThQeNGFVD4i2Zk6R3nw_eSkR6JMwKAAuEYglCXg$ > > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YNQ2XFZlhyUlipu4sdOKV4traakP3Yhdqce8KX11c4MZKm3JThQeNGFVD4i2Zk6R3nw_eSkR6JMwKAAuEYglCXg$ > > >> > >> > >> > >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM > >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend > >> > >> The examples that use DM, in particular DMDA all trivially support > using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > >> > >> > >> > >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: > >>> > >>> Barry, > >>> > >>> Thank you so much for the clarification. > >>> > >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are > there other tutorials available? > >>> > >>> Cho > >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM > >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>> > >>> Cho, > >>> > >>> We currently have a crappy API for turning on GPU support, and our > documentation is misleading in places. > >>> > >>> People constantly say "to use GPU's with PETSc you only need to > use -mat_type aijcusparse (for example)" This is incorrect. > >>> > >>> This does not work with code that uses the convenience Mat > constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only > works if you use the constructor approach of MatCreate(), MatSetSizes(), > MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to > use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda > >>> > >>> If you use DM to create the matrices and vectors then you can use > -dm_mat_type aijcusparse -dm_vec_type cuda > >>> > >>> Sorry for the confusion. > >>> > >>> Barry > >>> > >>> > >>> > >>> > >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: > >>>> > >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: > >>>> Matt, > >>>> > >>>> After inserting 2 lines in the code: > >>>> > >>>> ierr = > MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); > >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> > d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> "There are no unused options." However, there is no improvement on > the GPU performance. > >>>> > >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat > you created in steps 1 and 2. This is detailed in the manual. > >>>> > >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before > MatSetFromOptions(). > >>>> > >>>> THanks, > >>>> > >>>> Matt > >>>> Thanks, > >>>> Cho > >>>> > >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM > >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith ; Mark Adams ; > petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: > >>>> I managed to pass the following options to PETSc using a GPU node on > Perlmutter. > >>>> > >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>> > >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. > >>>> > >>>> o #PETSc Option Table entries: > >>>> -log_view > >>>> -mat_type aijcusparse > >>>> -options_left > >>>> -vec_type cuda > >>>> #End of PETSc Option Table entries > >>>> WARNING! There are options you set that were not used! > >>>> WARNING! could be spelling mistake, etc! > >>>> There is one unused database option. It is: > >>>> Option left: name:-mat_type value: aijcusparse > >>>> > >>>> The -mat_type option has not been used. In the application code, we > use > >>>> > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> > >>>> If you create the Mat this way, then you need MatSetFromOptions() in > order to set the type from the command line. > >>>> > >>>> Thanks, > >>>> > >>>> Matt > >>>> o The percent flops on the GPU for KSPSolve is 17%. > >>>> > >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an > order of magnitude slower. How can I improve the GPU performance? > >>>> > >>>> Thanks, > >>>> Cho > >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM > >>>> To: Barry Smith ; Mark Adams > >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> Barry, Mark and Matt, > >>>> > >>>> Thank you all for the suggestions. I will modify the code so we can > pass runtime options. > >>>> > >>>> Cho > >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM > >>>> To: Mark Adams > >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen < > cho at slac.stanford.edu>; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> > >>>> Note that options like -mat_type aijcusparse -vec_type cuda only > work if the program is set up to allow runtime swapping of matrix and > vector types. If you have a call to MatCreateMPIAIJ() or other specific > types then then these options do nothing but because Mark had you use > -options_left the program will tell you at the end that it did not use the > option so you will know. > >>>> > >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: > >>>>> > >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the > args and you run: > >>>>> > >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> > >>>>> Mark > >>>>> > >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: > >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> Mark, > >>>>> > >>>>> The application code reads in parameters from an input file, where > we can put the PETSc runtime options. Then we pass the options to > PetscInitialize(...). Does that sounds right? > >>>>> > >>>>> PETSc will read command line argument automatically in > PetscInitialize() unless you shut it off. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt > >>>>> Cho > >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM > >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Mark, > >>>>> > >>>>> Thanks for the information. How do I put the runtime options for the > executable, say, a.out, which does not have the provision to append > arguments? Do I need to change the C++ main to read in the options? > >>>>> > >>>>> Cho > >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM > >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view > -options_left > >>>>> The last column of the performance data (from -log_view) will be the > percent flops on the GPU. Check that that is > 0. > >>>>> > >>>>> The end of the output will list the options that were used and > options that were _not_ used (if any). Check that there are no options left. > >>>>> > >>>>> Mark > >>>>> > >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> I installed PETSc on Perlmutter using "spack install > petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I > compiled the application code (purely CPU code) linking to the petsc > package, hoping that I can get performance improvement using the petsc GPU > backend. However, the timing was the same using the same number of MPI > tasks with and without GPU accelerators. Have I missed something in the > process, for example, setting up PETSc options at runtime to use the GPU > backend? > >>>>> > >>>>> Thanks, > >>>>> Cho > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>>> -- Norbert Wiener > >>>>> > >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YNQ2XFZlhyUlipu4sdOKV4traakP3Yhdqce8KX11c4MZKm3JThQeNGFVD4i2Zk6R3nw_eSkR6JMwKAAuEYglCXg$ > > >>>> > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YNQ2XFZlhyUlipu4sdOKV4traakP3Yhdqce8KX11c4MZKm3JThQeNGFVD4i2Zk6R3nw_eSkR6JMwKAAuEYglCXg$ > > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YNQ2XFZlhyUlipu4sdOKV4traakP3Yhdqce8KX11c4MZKm3JThQeNGFVD4i2Zk6R3nw_eSkR6JMwKAAuEYglCXg$ > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cho at slac.stanford.edu Fri Apr 12 19:18:28 2024 From: cho at slac.stanford.edu (Ng, Cho-Kuen) Date: Sat, 13 Apr 2024 00:18:28 +0000 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> Message-ID: Mark. The FEM problem is of high-frequency Helmholtz type for the Maxwell equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this small problem and the results agreed well with those obtained from the direct solver MUMPS. However, this ksp-pc combination failed to converge for much larger problem sizes. What would be a good combination to solve this kind of large-scale problems where direct solvers would not be able to solve? Thanks, Cho ________________________________ From: Mark Adams Sent: Friday, April 12, 2024 4:52 PM To: Barry Smith Cc: Ng, Cho-Kuen ; petsc-users ; Jacob Faibussowitsch Subject: Re: [petsc-users] Using PETSc GPU backend As Barry said, this is a bit small but the performance looks reasonable. The solver does very badly, mathematically. I would try hypre to get another data point. You could also try 'cg' to check that the pipelined version is not a problem. Mark On Fri, Apr 12, 2024 at 3:54?PM Barry Smith > wrote: 800k is a pretty small problem for GPUs. We would need to see the runs with output from -ksp_view -log_view to see if the timing results are reasonable. On Apr 12, 2024, at 1:?48 PM, Ng, Cho-Kuen wrote:?I performed ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd 800k is a pretty small problem for GPUs. We would need to see the runs with output from -ksp_view -log_view to see if the timing results are reasonable. On Apr 12, 2024, at 1:48?PM, Ng, Cho-Kuen > wrote: I performed tests on comparison using KSP with and without cuda backend on NERSC's Perlmutter. For a finite element solve with 800k degrees of freedom, the best times obtained using MPI and MPI+GPU were o MPI - 128 MPI tasks, 27 s o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s Is that the performance one would expect using the hybrid mode of computation. Attached image shows the scaling on a single node. Thanks, Cho ________________________________ From: Ng, Cho-Kuen > Sent: Saturday, August 12, 2023 8:08 AM To: Jacob Faibussowitsch > Cc: Barry Smith >; petsc-users > Subject: Re: [petsc-users] Using PETSc GPU backend Thanks Jacob. ________________________________ From: Jacob Faibussowitsch > Sent: Saturday, August 12, 2023 5:02 AM To: Ng, Cho-Kuen > Cc: Barry Smith >; petsc-users > Subject: Re: [petsc-users] Using PETSc GPU backend > Can petsc show the number of GPUs used? -device_view Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users > wrote: > > Barry, > > I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: > > KSP Object: 32 MPI processes > > Can petsc show the number of GPUs used? > > Thanks, > Cho > > From: Barry Smith > > Sent: Wednesday, August 9, 2023 4:09 PM > To: Ng, Cho-Kuen > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Using PETSc GPU backend > > We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen > wrote: >> >> Barry and Matt, >> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? >> >> Best, >> Cho >> >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith >; Mark Adams > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cfjnsiHnC6Tzht-x33bTDCP2-7Bc67QD18EcSgUWnCgQcT3ra57kYGUsZniP3BzMLemn7X_2TkCFA6N4GmNEoOG9$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cfjnsiHnC6Tzht-x33bTDCP2-7Bc67QD18EcSgUWnCgQcT3ra57kYGUsZniP3BzMLemn7X_2TkCFA6N4GmNEoOG9$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cfjnsiHnC6Tzht-x33bTDCP2-7Bc67QD18EcSgUWnCgQcT3ra57kYGUsZniP3BzMLemn7X_2TkCFA6N4GmNEoOG9$ >> >> >> >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith >; Mark Adams > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cfjnsiHnC6Tzht-x33bTDCP2-7Bc67QD18EcSgUWnCgQcT3ra57kYGUsZniP3BzMLemn7X_2TkCFA6N4GmNEoOG9$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cfjnsiHnC6Tzht-x33bTDCP2-7Bc67QD18EcSgUWnCgQcT3ra57kYGUsZniP3BzMLemn7X_2TkCFA6N4GmNEoOG9$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cfjnsiHnC6Tzht-x33bTDCP2-7Bc67QD18EcSgUWnCgQcT3ra57kYGUsZniP3BzMLemn7X_2TkCFA6N4GmNEoOG9$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Apr 13 14:04:05 2024 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 13 Apr 2024 15:04:05 -0400 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> Message-ID: On Fri, Apr 12, 2024 at 8:19?PM Ng, Cho-Kuen via petsc-users < petsc-users at mcs.anl.gov> wrote: > Mark. The FEM problem is of high-frequency Helmholtz type for the Maxwell > equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this > small problem and the results agreed well with those obtained from the > direct solver > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Mark. > > The FEM problem is of high-frequency Helmholtz type for the Maxwell > equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this > small problem and the results agreed well with those obtained from the > direct solver MUMPS. However, this ksp-pc combination failed to converge > for much larger problem sizes. What would be a good combination to solve > this kind of large-scale problems where direct solvers would not be able to > solve? > Solving high frequency Helmholtz is a hard research problem. There are no easy answers. I know of no real solutions for C0 FEM. it seems to be a bad discretization for this from the start. I have not seen anything reliable that is not mostly direct, but this is not what I do for a living so something might exist, although not in any publicly available software. Thanks, Matt > Thanks, > Cho > ------------------------------ > *From:* Mark Adams > *Sent:* Friday, April 12, 2024 4:52 PM > *To:* Barry Smith > *Cc:* Ng, Cho-Kuen ; petsc-users < > petsc-users at mcs.anl.gov>; Jacob Faibussowitsch > *Subject:* Re: [petsc-users] Using PETSc GPU backend > > As Barry said, this is a bit small but the performance looks reasonable. > The solver does very badly, mathematically. > I would try hypre to get another data point. > You could also try 'cg' to check that the pipelined version is not a > problem. > Mark > > On Fri, Apr 12, 2024 at 3:54?PM Barry Smith wrote: > > 800k is a pretty small problem for GPUs. We would need to see the runs > with output from -ksp_view -log_view to see if the timing results are > reasonable. On Apr 12, 2024, at 1: 48 PM, Ng, Cho-Kuen stanford. edu> wrote: I performed > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > 800k is a pretty small problem for GPUs. > > We would need to see the runs with output from -ksp_view -log_view to > see if the timing results are reasonable. > > On Apr 12, 2024, at 1:48?PM, Ng, Cho-Kuen wrote: > > I performed tests on comparison using KSP with and without cuda backend on > NERSC's Perlmutter. For a finite element solve with 800k degrees of > freedom, the best times obtained using MPI and MPI+GPU were > > o MPI - 128 MPI tasks, 27 s > > o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s > > Is that the performance one would expect using the hybrid mode of > computation. Attached image shows the scaling on a single node. > > Thanks, > Cho > ------------------------------ > *From:* Ng, Cho-Kuen > *Sent:* Saturday, August 12, 2023 8:08 AM > *To:* Jacob Faibussowitsch > *Cc:* Barry Smith ; petsc-users > > *Subject:* Re: [petsc-users] Using PETSc GPU backend > > Thanks Jacob. > ------------------------------ > *From:* Jacob Faibussowitsch > *Sent:* Saturday, August 12, 2023 5:02 AM > *To:* Ng, Cho-Kuen > *Cc:* Barry Smith ; petsc-users > > *Subject:* Re: [petsc-users] Using PETSc GPU backend > > > Can petsc show the number of GPUs used? > > -device_view > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Barry, > > > > I tried again today on Perlmutter and running on multiple GPU nodes > worked. Likely, I had messed up something the other day. Also, I was able > to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output > shows the number of MPI tasks: > > > > KSP Object: 32 MPI processes > > > > Can petsc show the number of GPUs used? > > > > Thanks, > > Cho > > > > From: Barry Smith > > Sent: Wednesday, August 9, 2023 4:09 PM > > To: Ng, Cho-Kuen > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Using PETSc GPU backend > > > > We would need more information about "hanging". Do PETSc examples and > tiny problems "hang" on multiple nodes? If you run with -info what are the > last messages printed? Can you run with a debugger to see where it is > "hanging"? > > > > > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen wrote: > >> > >> Barry and Matt, > >> > >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 > node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple > nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I > fix this? > >> > >> Best, > >> Cho > >> > >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM > >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend > >> > >> The examples that use DM, in particular DMDA all trivially support > using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > >> > >> > >> > >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: > >>> > >>> Barry, > >>> > >>> Thank you so much for the clarification. > >>> > >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are > there other tutorials available? > >>> > >>> Cho > >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM > >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>> > >>> Cho, > >>> > >>> We currently have a crappy API for turning on GPU support, and our > documentation is misleading in places. > >>> > >>> People constantly say "to use GPU's with PETSc you only need to > use -mat_type aijcusparse (for example)" This is incorrect. > >>> > >>> This does not work with code that uses the convenience Mat > constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only > works if you use the constructor approach of MatCreate(), MatSetSizes(), > MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to > use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda > >>> > >>> If you use DM to create the matrices and vectors then you can use > -dm_mat_type aijcusparse -dm_vec_type cuda > >>> > >>> Sorry for the confusion. > >>> > >>> Barry > >>> > >>> > >>> > >>> > >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: > >>>> > >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: > >>>> Matt, > >>>> > >>>> After inserting 2 lines in the code: > >>>> > >>>> ierr = > MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); > >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> > d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> "There are no unused options." However, there is no improvement on > the GPU performance. > >>>> > >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat > you created in steps 1 and 2. This is detailed in the manual. > >>>> > >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before > MatSetFromOptions(). > >>>> > >>>> THanks, > >>>> > >>>> Matt > >>>> Thanks, > >>>> Cho > >>>> > >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM > >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith ; Mark Adams ; > petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: > >>>> I managed to pass the following options to PETSc using a GPU node on > Perlmutter. > >>>> > >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>> > >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. > >>>> > >>>> o #PETSc Option Table entries: > >>>> -log_view > >>>> -mat_type aijcusparse > >>>> -options_left > >>>> -vec_type cuda > >>>> #End of PETSc Option Table entries > >>>> WARNING! There are options you set that were not used! > >>>> WARNING! could be spelling mistake, etc! > >>>> There is one unused database option. It is: > >>>> Option left: name:-mat_type value: aijcusparse > >>>> > >>>> The -mat_type option has not been used. In the application code, we > use > >>>> > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> > >>>> If you create the Mat this way, then you need MatSetFromOptions() in > order to set the type from the command line. > >>>> > >>>> Thanks, > >>>> > >>>> Matt > >>>> o The percent flops on the GPU for KSPSolve is 17%. > >>>> > >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an > order of magnitude slower. How can I improve the GPU performance? > >>>> > >>>> Thanks, > >>>> Cho > >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM > >>>> To: Barry Smith ; Mark Adams > >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> Barry, Mark and Matt, > >>>> > >>>> Thank you all for the suggestions. I will modify the code so we can > pass runtime options. > >>>> > >>>> Cho > >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM > >>>> To: Mark Adams > >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen < > cho at slac.stanford.edu>; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> > >>>> Note that options like -mat_type aijcusparse -vec_type cuda only > work if the program is set up to allow runtime swapping of matrix and > vector types. If you have a call to MatCreateMPIAIJ() or other specific > types then then these options do nothing but because Mark had you use > -options_left the program will tell you at the end that it did not use the > option so you will know. > >>>> > >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: > >>>>> > >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the > args and you run: > >>>>> > >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> > >>>>> Mark > >>>>> > >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: > >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> Mark, > >>>>> > >>>>> The application code reads in parameters from an input file, where > we can put the PETSc runtime options. Then we pass the options to > PetscInitialize(...). Does that sounds right? > >>>>> > >>>>> PETSc will read command line argument automatically in > PetscInitialize() unless you shut it off. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt > >>>>> Cho > >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM > >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Mark, > >>>>> > >>>>> Thanks for the information. How do I put the runtime options for the > executable, say, a.out, which does not have the provision to append > arguments? Do I need to change the C++ main to read in the options? > >>>>> > >>>>> Cho > >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM > >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view > -options_left > >>>>> The last column of the performance data (from -log_view) will be the > percent flops on the GPU. Check that that is > 0. > >>>>> > >>>>> The end of the output will list the options that were used and > options that were _not_ used (if any). Check that there are no options left. > >>>>> > >>>>> Mark > >>>>> > >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> I installed PETSc on Perlmutter using "spack install > petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I > compiled the application code (purely CPU code) linking to the petsc > package, hoping that I can get performance improvement using the petsc GPU > backend. However, the timing was the same using the same number of MPI > tasks with and without GPU accelerators. Have I missed something in the > process, for example, setting up PETSc options at runtime to use the GPU > backend? > >>>>> > >>>>> Thanks, > >>>>> Cho > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>>> -- Norbert Wiener > >>>>> > >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eg-3VACvaflfdBfbxZoYCH53G1IX8r45qgpJlLNf-5DmCA76FwHJE8L8Wi4YY5HS--UbBRgFmMuKY7bvm-c0$ > > >>>> > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eg-3VACvaflfdBfbxZoYCH53G1IX8r45qgpJlLNf-5DmCA76FwHJE8L8Wi4YY5HS--UbBRgFmMuKY7bvm-c0$ > > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eg-3VACvaflfdBfbxZoYCH53G1IX8r45qgpJlLNf-5DmCA76FwHJE8L8Wi4YY5HS--UbBRgFmMuKY7bvm-c0$ > > >> > >> > >> > >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM > >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend > >> > >> The examples that use DM, in particular DMDA all trivially support > using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > >> > >> > >> > >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: > >>> > >>> Barry, > >>> > >>> Thank you so much for the clarification. > >>> > >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are > there other tutorials available? > >>> > >>> Cho > >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM > >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>> > >>> Cho, > >>> > >>> We currently have a crappy API for turning on GPU support, and our > documentation is misleading in places. > >>> > >>> People constantly say "to use GPU's with PETSc you only need to > use -mat_type aijcusparse (for example)" This is incorrect. > >>> > >>> This does not work with code that uses the convenience Mat > constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only > works if you use the constructor approach of MatCreate(), MatSetSizes(), > MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to > use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda > >>> > >>> If you use DM to create the matrices and vectors then you can use > -dm_mat_type aijcusparse -dm_vec_type cuda > >>> > >>> Sorry for the confusion. > >>> > >>> Barry > >>> > >>> > >>> > >>> > >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: > >>>> > >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: > >>>> Matt, > >>>> > >>>> After inserting 2 lines in the code: > >>>> > >>>> ierr = > MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); > >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> > d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> "There are no unused options." However, there is no improvement on > the GPU performance. > >>>> > >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat > you created in steps 1 and 2. This is detailed in the manual. > >>>> > >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before > MatSetFromOptions(). > >>>> > >>>> THanks, > >>>> > >>>> Matt > >>>> Thanks, > >>>> Cho > >>>> > >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM > >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith ; Mark Adams ; > petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: > >>>> I managed to pass the following options to PETSc using a GPU node on > Perlmutter. > >>>> > >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>> > >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. > >>>> > >>>> o #PETSc Option Table entries: > >>>> -log_view > >>>> -mat_type aijcusparse > >>>> -options_left > >>>> -vec_type cuda > >>>> #End of PETSc Option Table entries > >>>> WARNING! There are options you set that were not used! > >>>> WARNING! could be spelling mistake, etc! > >>>> There is one unused database option. It is: > >>>> Option left: name:-mat_type value: aijcusparse > >>>> > >>>> The -mat_type option has not been used. In the application code, we > use > >>>> > >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, > >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); > >>>> > >>>> > >>>> If you create the Mat this way, then you need MatSetFromOptions() in > order to set the type from the command line. > >>>> > >>>> Thanks, > >>>> > >>>> Matt > >>>> o The percent flops on the GPU for KSPSolve is 17%. > >>>> > >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an > order of magnitude slower. How can I improve the GPU performance? > >>>> > >>>> Thanks, > >>>> Cho > >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM > >>>> To: Barry Smith ; Mark Adams > >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> Barry, Mark and Matt, > >>>> > >>>> Thank you all for the suggestions. I will modify the code so we can > pass runtime options. > >>>> > >>>> Cho > >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM > >>>> To: Mark Adams > >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen < > cho at slac.stanford.edu>; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>> > >>>> Note that options like -mat_type aijcusparse -vec_type cuda only > work if the program is set up to allow runtime swapping of matrix and > vector types. If you have a call to MatCreateMPIAIJ() or other specific > types then then these options do nothing but because Mark had you use > -options_left the program will tell you at the end that it did not use the > option so you will know. > >>>> > >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: > >>>>> > >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the > args and you run: > >>>>> > >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left > >>>>> > >>>>> Mark > >>>>> > >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: > >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> Mark, > >>>>> > >>>>> The application code reads in parameters from an input file, where > we can put the PETSc runtime options. Then we pass the options to > PetscInitialize(...). Does that sounds right? > >>>>> > >>>>> PETSc will read command line argument automatically in > PetscInitialize() unless you shut it off. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Matt > >>>>> Cho > >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM > >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Mark, > >>>>> > >>>>> Thanks for the information. How do I put the runtime options for the > executable, say, a.out, which does not have the provision to append > arguments? Do I need to change the C++ main to read in the options? > >>>>> > >>>>> Cho > >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM > >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend > >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view > -options_left > >>>>> The last column of the performance data (from -log_view) will be the > percent flops on the GPU. Check that that is > 0. > >>>>> > >>>>> The end of the output will list the options that were used and > options that were _not_ used (if any). Check that there are no options left. > >>>>> > >>>>> Mark > >>>>> > >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>>>> I installed PETSc on Perlmutter using "spack install > petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I > compiled the application code (purely CPU code) linking to the petsc > package, hoping that I can get performance improvement using the petsc GPU > backend. However, the timing was the same using the same number of MPI > tasks with and without GPU accelerators. Have I missed something in the > process, for example, setting up PETSc options at runtime to use the GPU > backend? > >>>>> > >>>>> Thanks, > >>>>> Cho > >>>>> > >>>>> > >>>>> -- > >>>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>>> -- Norbert Wiener > >>>>> > >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eg-3VACvaflfdBfbxZoYCH53G1IX8r45qgpJlLNf-5DmCA76FwHJE8L8Wi4YY5HS--UbBRgFmMuKY7bvm-c0$ > > >>>> > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eg-3VACvaflfdBfbxZoYCH53G1IX8r45qgpJlLNf-5DmCA76FwHJE8L8Wi4YY5HS--UbBRgFmMuKY7bvm-c0$ > > >>>> > >>>> > >>>> -- > >>>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>>> -- Norbert Wiener > >>>> > >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eg-3VACvaflfdBfbxZoYCH53G1IX8r45qgpJlLNf-5DmCA76FwHJE8L8Wi4YY5HS--UbBRgFmMuKY7bvm-c0$ > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eg-3VACvaflfdBfbxZoYCH53G1IX8r45qgpJlLNf-5DmCA76FwHJE8L8Wi4YY5HS--UbBRgFmMuKY7bvm-c0$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Apr 13 14:53:40 2024 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 13 Apr 2024 15:53:40 -0400 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> Message-ID: As Matt said, high- frequency Helmholtz is very hard, low-frequency is doable by using a larger coarse grid (you have a tiny coarse grid). On Sat, Apr 13, 2024 at 3:04?PM Matthew Knepley wrote: > On Fri, Apr 12, 2024 at 8:19?PM Ng, Cho-Kuen via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Mark. The FEM problem is of high-frequency Helmholtz type for the Maxwell >> equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this >> small problem and the results agreed well with those obtained from the >> direct solver >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> Mark. >> >> The FEM problem is of high-frequency Helmholtz type for the Maxwell >> equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this >> small problem and the results agreed well with those obtained from the >> direct solver MUMPS. However, this ksp-pc combination failed to converge >> for much larger problem sizes. What would be a good combination to solve >> this kind of large-scale problems where direct solvers would not be able to >> solve? >> > > Solving high frequency Helmholtz is a hard research problem. There are no > easy answers. I know of no > real solutions for C0 FEM. it seems to be a bad discretization for this > from the start. I have not seen anything reliable that is not mostly > direct, but this is not what I do for a living so something might exist, > although not in any publicly available software. > > Thanks, > > Matt > > >> Thanks, >> Cho >> ------------------------------ >> *From:* Mark Adams >> *Sent:* Friday, April 12, 2024 4:52 PM >> *To:* Barry Smith >> *Cc:* Ng, Cho-Kuen ; petsc-users < >> petsc-users at mcs.anl.gov>; Jacob Faibussowitsch >> *Subject:* Re: [petsc-users] Using PETSc GPU backend >> >> As Barry said, this is a bit small but the performance looks reasonable. >> The solver does very badly, mathematically. >> I would try hypre to get another data point. >> You could also try 'cg' to check that the pipelined version is not a >> problem. >> Mark >> >> On Fri, Apr 12, 2024 at 3:54?PM Barry Smith wrote: >> >> 800k is a pretty small problem for GPUs. We would need to see the runs >> with output from -ksp_view -log_view to see if the timing results are >> reasonable. On Apr 12, 2024, at 1: 48 PM, Ng, Cho-Kuen > stanford. edu> wrote: I performed >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> >> 800k is a pretty small problem for GPUs. >> >> We would need to see the runs with output from -ksp_view -log_view to >> see if the timing results are reasonable. >> >> On Apr 12, 2024, at 1:48?PM, Ng, Cho-Kuen wrote: >> >> I performed tests on comparison using KSP with and without cuda backend >> on NERSC's Perlmutter. For a finite element solve with 800k degrees of >> freedom, the best times obtained using MPI and MPI+GPU were >> >> o MPI - 128 MPI tasks, 27 s >> >> o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s >> >> Is that the performance one would expect using the hybrid mode of >> computation. Attached image shows the scaling on a single node. >> >> Thanks, >> Cho >> ------------------------------ >> *From:* Ng, Cho-Kuen >> *Sent:* Saturday, August 12, 2023 8:08 AM >> *To:* Jacob Faibussowitsch >> *Cc:* Barry Smith ; petsc-users < >> petsc-users at mcs.anl.gov> >> *Subject:* Re: [petsc-users] Using PETSc GPU backend >> >> Thanks Jacob. >> ------------------------------ >> *From:* Jacob Faibussowitsch >> *Sent:* Saturday, August 12, 2023 5:02 AM >> *To:* Ng, Cho-Kuen >> *Cc:* Barry Smith ; petsc-users < >> petsc-users at mcs.anl.gov> >> *Subject:* Re: [petsc-users] Using PETSc GPU backend >> >> > Can petsc show the number of GPUs used? >> >> -device_view >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >> > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> > Barry, >> > >> > I tried again today on Perlmutter and running on multiple GPU nodes >> worked. Likely, I had messed up something the other day. Also, I was able >> to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output >> shows the number of MPI tasks: >> > >> > KSP Object: 32 MPI processes >> > >> > Can petsc show the number of GPUs used? >> > >> > Thanks, >> > Cho >> > >> > From: Barry Smith >> > Sent: Wednesday, August 9, 2023 4:09 PM >> > To: Ng, Cho-Kuen >> > Cc: petsc-users at mcs.anl.gov >> > Subject: Re: [petsc-users] Using PETSc GPU backend >> > >> > We would need more information about "hanging". Do PETSc examples and >> tiny problems "hang" on multiple nodes? If you run with -info what are the >> last messages printed? Can you run with a debugger to see where it is >> "hanging"? >> > >> > >> > >> >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen >> wrote: >> >> >> >> Barry and Matt, >> >> >> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 >> node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple >> nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I >> fix this? >> >> >> >> Best, >> >> Cho >> >> >> >> From: Barry Smith >> >> Sent: Monday, July 17, 2023 6:58 AM >> >> To: Ng, Cho-Kuen >> >> Cc: petsc-users at mcs.anl.gov >> >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> >> >> The examples that use DM, in particular DMDA all trivially support >> using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >> >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen >> wrote: >> >>> >> >>> Barry, >> >>> >> >>> Thank you so much for the clarification. >> >>> >> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are >> there other tutorials available? >> >>> >> >>> Cho >> >>> From: Barry Smith >> >>> Sent: Saturday, July 15, 2023 8:36 AM >> >>> To: Ng, Cho-Kuen >> >>> Cc: petsc-users at mcs.anl.gov >> >>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>> >> >>> Cho, >> >>> >> >>> We currently have a crappy API for turning on GPU support, and >> our documentation is misleading in places. >> >>> >> >>> People constantly say "to use GPU's with PETSc you only need to >> use -mat_type aijcusparse (for example)" This is incorrect. >> >>> >> >>> This does not work with code that uses the convenience Mat >> constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only >> works if you use the constructor approach of MatCreate(), MatSetSizes(), >> MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to >> use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >> >>> >> >>> If you use DM to create the matrices and vectors then you can use >> -dm_mat_type aijcusparse -dm_vec_type cuda >> >>> >> >>> Sorry for the confusion. >> >>> >> >>> Barry >> >>> >> >>> >> >>> >> >>> >> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley >> wrote: >> >>>> >> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen >> wrote: >> >>>> Matt, >> >>>> >> >>>> After inserting 2 lines in the code: >> >>>> >> >>>> ierr = >> MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >> >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >> >>>> >> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >> >>>> >> >>>> "There are no unused options." However, there is no improvement on >> the GPU performance. >> >>>> >> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat >> you created in steps 1 and 2. This is detailed in the manual. >> >>>> >> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before >> MatSetFromOptions(). >> >>>> >> >>>> THanks, >> >>>> >> >>>> Matt >> >>>> Thanks, >> >>>> Cho >> >>>> >> >>>> From: Matthew Knepley >> >>>> Sent: Friday, July 14, 2023 5:57 PM >> >>>> To: Ng, Cho-Kuen >> >>>> Cc: Barry Smith ; Mark Adams ; >> petsc-users at mcs.anl.gov >> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen >> wrote: >> >>>> I managed to pass the following options to PETSc using a GPU node on >> Perlmutter. >> >>>> >> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >> >>>> >> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >> >>>> >> >>>> o #PETSc Option Table entries: >> >>>> -log_view >> >>>> -mat_type aijcusparse >> >>>> -options_left >> >>>> -vec_type cuda >> >>>> #End of PETSc Option Table entries >> >>>> WARNING! There are options you set that were not used! >> >>>> WARNING! could be spelling mistake, etc! >> >>>> There is one unused database option. It is: >> >>>> Option left: name:-mat_type value: aijcusparse >> >>>> >> >>>> The -mat_type option has not been used. In the application code, we >> use >> >>>> >> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >> >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >> >>>> >> >>>> >> >>>> If you create the Mat this way, then you need MatSetFromOptions() in >> order to set the type from the command line. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Matt >> >>>> o The percent flops on the GPU for KSPSolve is 17%. >> >>>> >> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an >> order of magnitude slower. How can I improve the GPU performance? >> >>>> >> >>>> Thanks, >> >>>> Cho >> >>>> From: Ng, Cho-Kuen >> >>>> Sent: Friday, June 30, 2023 7:57 AM >> >>>> To: Barry Smith ; Mark Adams >> >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov> >> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>> Barry, Mark and Matt, >> >>>> >> >>>> Thank you all for the suggestions. I will modify the code so we can >> pass runtime options. >> >>>> >> >>>> Cho >> >>>> From: Barry Smith >> >>>> Sent: Friday, June 30, 2023 7:01 AM >> >>>> To: Mark Adams >> >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen < >> cho at slac.stanford.edu>; petsc-users at mcs.anl.gov >> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>> >> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only >> work if the program is set up to allow runtime swapping of matrix and >> vector types. If you have a call to MatCreateMPIAIJ() or other specific >> types then then these options do nothing but because Mark had you use >> -options_left the program will tell you at the end that it did not use the >> option so you will know. >> >>>> >> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >> >>>>> >> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the >> args and you run: >> >>>>> >> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >> >>>>> >> >>>>> Mark >> >>>>> >> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley >> wrote: >> >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>>>> Mark, >> >>>>> >> >>>>> The application code reads in parameters from an input file, where >> we can put the PETSc runtime options. Then we pass the options to >> PetscInitialize(...). Does that sounds right? >> >>>>> >> >>>>> PETSc will read command line argument automatically in >> PetscInitialize() unless you shut it off. >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Matt >> >>>>> Cho >> >>>>> From: Ng, Cho-Kuen >> >>>>> Sent: Thursday, June 29, 2023 8:32 PM >> >>>>> To: Mark Adams >> >>>>> Cc: petsc-users at mcs.anl.gov >> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>>> Mark, >> >>>>> >> >>>>> Thanks for the information. How do I put the runtime options for >> the executable, say, a.out, which does not have the provision to append >> arguments? Do I need to change the C++ main to read in the options? >> >>>>> >> >>>>> Cho >> >>>>> From: Mark Adams >> >>>>> Sent: Thursday, June 29, 2023 5:55 PM >> >>>>> To: Ng, Cho-Kuen >> >>>>> Cc: petsc-users at mcs.anl.gov >> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view >> -options_left >> >>>>> The last column of the performance data (from -log_view) will be >> the percent flops on the GPU. Check that that is > 0. >> >>>>> >> >>>>> The end of the output will list the options that were used and >> options that were _not_ used (if any). Check that there are no options left. >> >>>>> >> >>>>> Mark >> >>>>> >> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>>>> I installed PETSc on Perlmutter using "spack install >> petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I >> compiled the application code (purely CPU code) linking to the petsc >> package, hoping that I can get performance improvement using the petsc GPU >> backend. However, the timing was the same using the same number of MPI >> tasks with and without GPU accelerators. Have I missed something in the >> process, for example, setting up PETSc options at runtime to use the GPU >> backend? >> >>>>> >> >>>>> Thanks, >> >>>>> Cho >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>>>> -- Norbert Wiener >> >>>>> >> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ >> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>>> -- Norbert Wiener >> >>>> >> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ >> >> >>>> >> >>>> >> >>>> -- >> >>>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>>> -- Norbert Wiener >> >>>> >> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ >> >> >> >> >> >> >> >> >> From: Barry Smith >> >> Sent: Monday, July 17, 2023 6:58 AM >> >> To: Ng, Cho-Kuen >> >> Cc: petsc-users at mcs.anl.gov >> >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> >> >> The examples that use DM, in particular DMDA all trivially support >> using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >> >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen >> wrote: >> >>> >> >>> Barry, >> >>> >> >>> Thank you so much for the clarification. >> >>> >> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are >> there other tutorials available? >> >>> >> >>> Cho >> >>> From: Barry Smith >> >>> Sent: Saturday, July 15, 2023 8:36 AM >> >>> To: Ng, Cho-Kuen >> >>> Cc: petsc-users at mcs.anl.gov >> >>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>> >> >>> Cho, >> >>> >> >>> We currently have a crappy API for turning on GPU support, and >> our documentation is misleading in places. >> >>> >> >>> People constantly say "to use GPU's with PETSc you only need to >> use -mat_type aijcusparse (for example)" This is incorrect. >> >>> >> >>> This does not work with code that uses the convenience Mat >> constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only >> works if you use the constructor approach of MatCreate(), MatSetSizes(), >> MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to >> use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >> >>> >> >>> If you use DM to create the matrices and vectors then you can use >> -dm_mat_type aijcusparse -dm_vec_type cuda >> >>> >> >>> Sorry for the confusion. >> >>> >> >>> Barry >> >>> >> >>> >> >>> >> >>> >> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley >> wrote: >> >>>> >> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen >> wrote: >> >>>> Matt, >> >>>> >> >>>> After inserting 2 lines in the code: >> >>>> >> >>>> ierr = >> MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >> >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >> >>>> >> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >> >>>> >> >>>> "There are no unused options." However, there is no improvement on >> the GPU performance. >> >>>> >> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat >> you created in steps 1 and 2. This is detailed in the manual. >> >>>> >> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before >> MatSetFromOptions(). >> >>>> >> >>>> THanks, >> >>>> >> >>>> Matt >> >>>> Thanks, >> >>>> Cho >> >>>> >> >>>> From: Matthew Knepley >> >>>> Sent: Friday, July 14, 2023 5:57 PM >> >>>> To: Ng, Cho-Kuen >> >>>> Cc: Barry Smith ; Mark Adams ; >> petsc-users at mcs.anl.gov >> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen >> wrote: >> >>>> I managed to pass the following options to PETSc using a GPU node on >> Perlmutter. >> >>>> >> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >> >>>> >> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >> >>>> >> >>>> o #PETSc Option Table entries: >> >>>> -log_view >> >>>> -mat_type aijcusparse >> >>>> -options_left >> >>>> -vec_type cuda >> >>>> #End of PETSc Option Table entries >> >>>> WARNING! There are options you set that were not used! >> >>>> WARNING! could be spelling mistake, etc! >> >>>> There is one unused database option. It is: >> >>>> Option left: name:-mat_type value: aijcusparse >> >>>> >> >>>> The -mat_type option has not been used. In the application code, we >> use >> >>>> >> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >> >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >> >>>> >> >>>> >> >>>> If you create the Mat this way, then you need MatSetFromOptions() in >> order to set the type from the command line. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Matt >> >>>> o The percent flops on the GPU for KSPSolve is 17%. >> >>>> >> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an >> order of magnitude slower. How can I improve the GPU performance? >> >>>> >> >>>> Thanks, >> >>>> Cho >> >>>> From: Ng, Cho-Kuen >> >>>> Sent: Friday, June 30, 2023 7:57 AM >> >>>> To: Barry Smith ; Mark Adams >> >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov < >> petsc-users at mcs.anl.gov> >> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>> Barry, Mark and Matt, >> >>>> >> >>>> Thank you all for the suggestions. I will modify the code so we can >> pass runtime options. >> >>>> >> >>>> Cho >> >>>> From: Barry Smith >> >>>> Sent: Friday, June 30, 2023 7:01 AM >> >>>> To: Mark Adams >> >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen < >> cho at slac.stanford.edu>; petsc-users at mcs.anl.gov >> >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>> >> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only >> work if the program is set up to allow runtime swapping of matrix and >> vector types. If you have a call to MatCreateMPIAIJ() or other specific >> types then then these options do nothing but because Mark had you use >> -options_left the program will tell you at the end that it did not use the >> option so you will know. >> >>>> >> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >> >>>>> >> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the >> args and you run: >> >>>>> >> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >> >>>>> >> >>>>> Mark >> >>>>> >> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley >> wrote: >> >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>>>> Mark, >> >>>>> >> >>>>> The application code reads in parameters from an input file, where >> we can put the PETSc runtime options. Then we pass the options to >> PetscInitialize(...). Does that sounds right? >> >>>>> >> >>>>> PETSc will read command line argument automatically in >> PetscInitialize() unless you shut it off. >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Matt >> >>>>> Cho >> >>>>> From: Ng, Cho-Kuen >> >>>>> Sent: Thursday, June 29, 2023 8:32 PM >> >>>>> To: Mark Adams >> >>>>> Cc: petsc-users at mcs.anl.gov >> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>>> Mark, >> >>>>> >> >>>>> Thanks for the information. How do I put the runtime options for >> the executable, say, a.out, which does not have the provision to append >> arguments? Do I need to change the C++ main to read in the options? >> >>>>> >> >>>>> Cho >> >>>>> From: Mark Adams >> >>>>> Sent: Thursday, June 29, 2023 5:55 PM >> >>>>> To: Ng, Cho-Kuen >> >>>>> Cc: petsc-users at mcs.anl.gov >> >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >> >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view >> -options_left >> >>>>> The last column of the performance data (from -log_view) will be >> the percent flops on the GPU. Check that that is > 0. >> >>>>> >> >>>>> The end of the output will list the options that were used and >> options that were _not_ used (if any). Check that there are no options left. >> >>>>> >> >>>>> Mark >> >>>>> >> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>>>> I installed PETSc on Perlmutter using "spack install >> petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I >> compiled the application code (purely CPU code) linking to the petsc >> package, hoping that I can get performance improvement using the petsc GPU >> backend. However, the timing was the same using the same number of MPI >> tasks with and without GPU accelerators. Have I missed something in the >> process, for example, setting up PETSc options at runtime to use the GPU >> backend? >> >>>>> >> >>>>> Thanks, >> >>>>> Cho >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>>>> -- Norbert Wiener >> >>>>> >> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ >> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>>> -- Norbert Wiener >> >>>> >> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ >> >> >>>> >> >>>> >> >>>> -- >> >>>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>>> -- Norbert Wiener >> >>>> >> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ >> >> >> >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fzprTMUfKGkoQcPM_9rzpAwqm9RSmA7djOy8cPIpHuaXVzOoeyoCY7H375lUNWy-az1Fwc6MzSzjF96EH-7LV7k$ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cho at slac.stanford.edu Sat Apr 13 15:26:47 2024 From: cho at slac.stanford.edu (Ng, Cho-Kuen) Date: Sat, 13 Apr 2024 20:26:47 +0000 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> <06368184-7850-4C55-B04B-38F1FCE75443@petsc.dev> Message-ID: Thanks Mark and Matt for the comments. ________________________________ From: Mark Adams Sent: Saturday, April 13, 2024 12:53 PM To: Matthew Knepley Cc: Ng, Cho-Kuen ; Barry Smith ; petsc-users ; Jacob Faibussowitsch Subject: Re: [petsc-users] Using PETSc GPU backend As Matt said, high- frequency Helmholtz is very hard, low-frequency is doable by using a larger coarse grid (you have a tiny coarse grid). On Sat, Apr 13, 2024 at 3:04?PM Matthew Knepley > wrote: On Fri, Apr 12, 2024 at 8:19?PM Ng, Cho-Kuen via petsc-users > wrote: Mark. The FEM problem is of high-frequency Helmholtz type for the Maxwell equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this small problem and the results agreed well with those obtained from the direct solver ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Mark. The FEM problem is of high-frequency Helmholtz type for the Maxwell equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this small problem and the results agreed well with those obtained from the direct solver MUMPS. However, this ksp-pc combination failed to converge for much larger problem sizes. What would be a good combination to solve this kind of large-scale problems where direct solvers would not be able to solve? Solving high frequency Helmholtz is a hard research problem. There are no easy answers. I know of no real solutions for C0 FEM. it seems to be a bad discretization for this from the start. I have not seen anything reliable that is not mostly direct, but this is not what I do for a living so something might exist, although not in any publicly available software. Thanks, Matt Thanks, Cho ________________________________ From: Mark Adams > Sent: Friday, April 12, 2024 4:52 PM To: Barry Smith > Cc: Ng, Cho-Kuen >; petsc-users >; Jacob Faibussowitsch > Subject: Re: [petsc-users] Using PETSc GPU backend As Barry said, this is a bit small but the performance looks reasonable. The solver does very badly, mathematically. I would try hypre to get another data point. You could also try 'cg' to check that the pipelined version is not a problem. Mark On Fri, Apr 12, 2024 at 3:54?PM Barry Smith > wrote: 800k is a pretty small problem for GPUs. We would need to see the runs with output from -ksp_view -log_view to see if the timing results are reasonable. On Apr 12, 2024, at 1:?48 PM, Ng, Cho-Kuen wrote:?I performed ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd 800k is a pretty small problem for GPUs. We would need to see the runs with output from -ksp_view -log_view to see if the timing results are reasonable. On Apr 12, 2024, at 1:48?PM, Ng, Cho-Kuen > wrote: I performed tests on comparison using KSP with and without cuda backend on NERSC's Perlmutter. For a finite element solve with 800k degrees of freedom, the best times obtained using MPI and MPI+GPU were o MPI - 128 MPI tasks, 27 s o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s Is that the performance one would expect using the hybrid mode of computation. Attached image shows the scaling on a single node. Thanks, Cho ________________________________ From: Ng, Cho-Kuen > Sent: Saturday, August 12, 2023 8:08 AM To: Jacob Faibussowitsch > Cc: Barry Smith >; petsc-users > Subject: Re: [petsc-users] Using PETSc GPU backend Thanks Jacob. ________________________________ From: Jacob Faibussowitsch > Sent: Saturday, August 12, 2023 5:02 AM To: Ng, Cho-Kuen > Cc: Barry Smith >; petsc-users > Subject: Re: [petsc-users] Using PETSc GPU backend > Can petsc show the number of GPUs used? -device_view Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users > wrote: > > Barry, > > I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: > > KSP Object: 32 MPI processes > > Can petsc show the number of GPUs used? > > Thanks, > Cho > > From: Barry Smith > > Sent: Wednesday, August 9, 2023 4:09 PM > To: Ng, Cho-Kuen > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Using PETSc GPU backend > > We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen > wrote: >> >> Barry and Matt, >> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? >> >> Best, >> Cho >> >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith >; Mark Adams > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$ >> >> >> >> From: Barry Smith > >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith > >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen > >>> Cc: petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley > >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith >; Mark Adams > >>>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith > >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams > >>>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen > >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams > >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen > >>>>> Cc: petsc-users at mcs.anl.gov > >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srvenkat at utexas.edu Tue Apr 16 20:35:07 2024 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Tue, 16 Apr 2024 20:35:07 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: I finally figured out a way to make it work. I had to build PETSc and my application using the (non GPU-aware) Intel MPI. Then, before running, I switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only way I've found to compile and run successfully without throwing any errors about not having a GPU-aware MPI. On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: > You may need to set some env variables. This can be system specific so you > might want to look at docs or ask TACC how to run with GPU-aware MPI. > > Mark > > On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat > wrote: > >> Actually, when I compile my program with this build of PETSc and run, I >> still get the error: >> >> PETSC ERROR: PETSc is configured with GPU support, but your MPI is not >> GPU-aware. For better performance, please use a GPU-aware MPI. >> >> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >> >> Is there anything else I need to do? >> >> Thanks, >> Sreeram >> >> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat >> wrote: >> >>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>> >>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay wrote: >>> >>>> Executing: mpicc -show >>>> stdout: icc -I/opt/apps/cuda/11.4/include -I/opt/apps/cuda/11.4/include >>>> -lcuda -L/opt/apps/cuda/11.4/lib64/stubs -L/opt/apps/cuda/11.4/lib64 >>>> -lcudart -lrt -Wl,-rpath,/opt/apps/cuda/11.4/lib64 >>>> -Wl,-rpath,XORIGIN/placeholder -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ >>>> -lm -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>> >>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>> >>>> Looks like you are trying to mix in 2 different cuda versions in this >>>> build. >>>> >>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>> >>>> Satish >>>> >>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>> >>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat >>>> wrote: >>>> > >>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>> MVAPICH2-GDR. >>>> > > >>>> > > Here is my configure command: >>>> > > >>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr --download-hypre >>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>> --download-metis >>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>> --with-fc=mpif90 >>>> > > >>>> > > which errors with: >>>> > > >>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>> for >>>> > > details): >>>> > > >>>> > > >>>> --------------------------------------------------------------------------------------------- >>>> > > CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14 >>>> > > -Xcompiler -fPIC >>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>> > > arch=compute_80,code=sm_80" >>>> > > generated from "--with-cuda-arch=80" >>>> > > >>>> > > >>>> > > >>>> > > The same configure command works when I use the Intel MPI and I can >>>> build >>>> > > with CUDA. The full config.log file is attached. Please let me know >>>> if you >>>> > > need any other information. I appreciate your help with this. >>>> > > >>>> > >>>> > The proximate error is >>>> > >>>> > Executing: nvcc -c -o >>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>> > -I/tmp/petsc-kn3f29gl/config.types >>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ -std=c++14 >>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>> > arch=compute_80,code=sm_80 /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>> > conftest.cu >>>> > stdout: >>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>> one >>>> > instance of overloaded function "__nv_associate_access_property_impl" >>>> has >>>> > "C" linkage >>>> > 1 error detected in the compilation of >>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu". >>>> > Possible ERROR while running compiler: exit code 1 >>>> > stderr: >>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>> one >>>> > instance of overloaded function "__nv_associate_access_property_impl" >>>> has >>>> > "C" linkage >>>> > >>>> > 1 error detected in the compilation of >>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>> > >>>> > This looks like screwed up headers to me, but I will let someone that >>>> > understands CUDA compilation reply. >>>> > >>>> > Thanks, >>>> > >>>> > Matt >>>> > >>>> > Thanks, >>>> > > Sreeram >>>> > > >>>> > >>>> > >>>> > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Apr 16 22:40:55 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 16 Apr 2024 22:40:55 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: Glad to hear you found a way. Did you use Frontera at TACC? If yes, I could have a try. --Junchao Zhang On Tue, Apr 16, 2024 at 8:35?PM Sreeram R Venkat wrote: > I finally figured out a way to make it work. I had to build PETSc and my > application using the (non GPU-aware) Intel MPI. Then, before running, I > switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only > way I've > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > I finally figured out a way to make it work. I had to build PETSc and my > application using the (non GPU-aware) Intel MPI. Then, before running, I > switch to the MVAPICH2-GDR. > I'm not sure why that works, but it's the only way I've found to compile > and run successfully without throwing any errors about not having a > GPU-aware MPI. > > > > On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: > >> You may need to set some env variables. This can be system specific so >> you might want to look at docs or ask TACC how to run with GPU-aware MPI. >> >> Mark >> >> On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat >> wrote: >> >>> Actually, when I compile my program with this build of PETSc and run, I >>> still get the error: >>> >>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is not >>> GPU-aware. For better performance, please use a GPU-aware MPI. >>> >>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>> >>> Is there anything else I need to do? >>> >>> Thanks, >>> Sreeram >>> >>> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat >>> wrote: >>> >>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>> >>>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay wrote: >>>> >>>>> Executing: mpicc -show >>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>>> >>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>> >>>>> Looks like you are trying to mix in 2 different cuda versions in this >>>>> build. >>>>> >>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>> >>>>> Satish >>>>> >>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>> >>>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat >>>>> wrote: >>>>> > >>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>> MVAPICH2-GDR. >>>>> > > >>>>> > > Here is my configure command: >>>>> > > >>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr --download-hypre >>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>> --download-metis >>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>> --with-fc=mpif90 >>>>> > > >>>>> > > which errors with: >>>>> > > >>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>> configure.log for >>>>> > > details): >>>>> > > >>>>> > > >>>>> --------------------------------------------------------------------------------------------- >>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14 >>>>> > > -Xcompiler -fPIC >>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>> > > arch=compute_80,code=sm_80" >>>>> > > generated from "--with-cuda-arch=80" >>>>> > > >>>>> > > >>>>> > > >>>>> > > The same configure command works when I use the Intel MPI and I >>>>> can build >>>>> > > with CUDA. The full config.log file is attached. Please let me >>>>> know if you >>>>> > > need any other information. I appreciate your help with this. >>>>> > > >>>>> > >>>>> > The proximate error is >>>>> > >>>>> > Executing: nvcc -c -o >>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ -std=c++14 >>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>> > arch=compute_80,code=sm_80 /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>> > conftest.cu >>>>> >>>>> > stdout: >>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>>> one >>>>> > instance of overloaded function >>>>> "__nv_associate_access_property_impl" has >>>>> > "C" linkage >>>>> > 1 error detected in the compilation of >>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>> >>>>> ". >>>>> > Possible ERROR while running compiler: exit code 1 >>>>> > stderr: >>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>>> one >>>>> > instance of overloaded function >>>>> "__nv_associate_access_property_impl" has >>>>> > "C" linkage >>>>> > >>>>> > 1 error detected in the compilation of >>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>> > >>>>> > This looks like screwed up headers to me, but I will let someone that >>>>> > understands CUDA compilation reply. >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Matt >>>>> > >>>>> > Thanks, >>>>> > > Sreeram >>>>> > > >>>>> > >>>>> > >>>>> > >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Wed Apr 17 03:55:11 2024 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Wed, 17 Apr 2024 08:55:11 +0000 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: * Did you use Frontera at TACC? If yes, I could have a try. If you?re interested in access to other TACC machines that can be arranged. I once set up a project for PETSc access to TACC. I think that was for a github CI but we never actually set that up. Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srvenkat at utexas.edu Wed Apr 17 07:50:54 2024 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Wed, 17 Apr 2024 07:50:54 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: Do you know if there are plans for NCCL support in PETSc? On Tue, Apr 16, 2024, 10:41?PM Junchao Zhang wrote: > Glad to hear you found a way. Did you use Frontera at TACC? If yes, I > could have a try. > > --Junchao Zhang > > > On Tue, Apr 16, 2024 at 8:35?PM Sreeram R Venkat > wrote: > >> I finally figured out a way to make it work. I had to build PETSc and my >> application using the (non GPU-aware) Intel MPI. Then, before running, I >> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only >> way I've >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> I finally figured out a way to make it work. I had to build PETSc and my >> application using the (non GPU-aware) Intel MPI. Then, before running, I >> switch to the MVAPICH2-GDR. >> I'm not sure why that works, but it's the only way I've found to compile >> and run successfully without throwing any errors about not having a >> GPU-aware MPI. >> >> >> >> On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: >> >>> You may need to set some env variables. This can be system specific so >>> you might want to look at docs or ask TACC how to run with GPU-aware MPI. >>> >>> Mark >>> >>> On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat >>> wrote: >>> >>>> Actually, when I compile my program with this build of PETSc and run, I >>>> still get the error: >>>> >>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is not >>>> GPU-aware. For better performance, please use a GPU-aware MPI. >>>> >>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>> >>>> Is there anything else I need to do? >>>> >>>> Thanks, >>>> Sreeram >>>> >>>> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat >>>> wrote: >>>> >>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>>> >>>>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay wrote: >>>>> >>>>>> Executing: mpicc -show >>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>>>> >>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>> >>>>>> Looks like you are trying to mix in 2 different cuda versions in this >>>>>> build. >>>>>> >>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>>> >>>>>> Satish >>>>>> >>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>> >>>>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat < >>>>>> srvenkat at utexas.edu> wrote: >>>>>> > >>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>> MVAPICH2-GDR. >>>>>> > > >>>>>> > > Here is my configure command: >>>>>> > > >>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr --download-hypre >>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>> --download-metis >>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>> --with-fc=mpif90 >>>>>> > > >>>>>> > > which errors with: >>>>>> > > >>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>> configure.log for >>>>>> > > details): >>>>>> > > >>>>>> > > >>>>>> --------------------------------------------------------------------------------------------- >>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14 >>>>>> > > -Xcompiler -fPIC >>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>> > > arch=compute_80,code=sm_80" >>>>>> > > generated from "--with-cuda-arch=80" >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > The same configure command works when I use the Intel MPI and I >>>>>> can build >>>>>> > > with CUDA. The full config.log file is attached. Please let me >>>>>> know if you >>>>>> > > need any other information. I appreciate your help with this. >>>>>> > > >>>>>> > >>>>>> > The proximate error is >>>>>> > >>>>>> > Executing: nvcc -c -o >>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ -std=c++14 >>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>> -gencode >>>>>> > arch=compute_80,code=sm_80 >>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>> > conftest.cu >>>>>> >>>>>> > stdout: >>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>>>> one >>>>>> > instance of overloaded function >>>>>> "__nv_associate_access_property_impl" has >>>>>> > "C" linkage >>>>>> > 1 error detected in the compilation of >>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>> >>>>>> ". >>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>> > stderr: >>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more than >>>>>> one >>>>>> > instance of overloaded function >>>>>> "__nv_associate_access_property_impl" has >>>>>> > "C" linkage >>>>>> > >>>>>> > 1 error detected in the compilation of >>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>> > >>>>>> > This looks like screwed up headers to me, but I will let someone >>>>>> that >>>>>> > understands CUDA compilation reply. >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > Matt >>>>>> > >>>>>> > Thanks, >>>>>> > > Sreeram >>>>>> > > >>>>>> > >>>>>> > >>>>>> > >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Apr 17 07:54:57 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 17 Apr 2024 07:54:57 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: Victor, through the SMART PETSc project, I do have access to Frontera and Lonestar6. --Junchao Zhang On Wed, Apr 17, 2024 at 3:55?AM Victor Eijkhout wrote: > > > > > - Did you use Frontera at TACC? If yes, I could have a try. > > > > If you?re interested in access to other TACC machines that can be > arranged. I once set up a project for PETSc access to TACC. I think that > was for a github CI but we never actually set that up. > > > > Victor. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Apr 17 07:58:12 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 17 Apr 2024 07:58:12 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: On Wed, Apr 17, 2024 at 7:51?AM Sreeram R Venkat wrote: > Do you know if there are plans for NCCL support in PETSc? > What is your need? Do you mean using NCCL for the MPI communication? > > On Tue, Apr 16, 2024, 10:41?PM Junchao Zhang > wrote: > >> Glad to hear you found a way. Did you use Frontera at TACC? If yes, I >> could have a try. >> >> --Junchao Zhang >> >> >> On Tue, Apr 16, 2024 at 8:35?PM Sreeram R Venkat >> wrote: >> >>> I finally figured out a way to make it work. I had to build PETSc and my >>> application using the (non GPU-aware) Intel MPI. Then, before running, I >>> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only >>> way I've >>> ZjQcmQRYFpfptBannerStart >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> >>> ZjQcmQRYFpfptBannerEnd >>> I finally figured out a way to make it work. I had to build PETSc and my >>> application using the (non GPU-aware) Intel MPI. Then, before running, I >>> switch to the MVAPICH2-GDR. >>> I'm not sure why that works, but it's the only way I've found to compile >>> and run successfully without throwing any errors about not having a >>> GPU-aware MPI. >>> >>> >>> >>> On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: >>> >>>> You may need to set some env variables. This can be system specific so >>>> you might want to look at docs or ask TACC how to run with GPU-aware MPI. >>>> >>>> Mark >>>> >>>> On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat >>>> wrote: >>>> >>>>> Actually, when I compile my program with this build of PETSc and run, >>>>> I still get the error: >>>>> >>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is not >>>>> GPU-aware. For better performance, please use a GPU-aware MPI. >>>>> >>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>> >>>>> Is there anything else I need to do? >>>>> >>>>> Thanks, >>>>> Sreeram >>>>> >>>>> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat >>>>> wrote: >>>>> >>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>>>> >>>>>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay >>>>>> wrote: >>>>>> >>>>>>> Executing: mpicc -show >>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>>>>> >>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>> >>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>> this build. >>>>>>> >>>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>>>> >>>>>>> Satish >>>>>>> >>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>> >>>>>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat < >>>>>>> srvenkat at utexas.edu> wrote: >>>>>>> > >>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>> MVAPICH2-GDR. >>>>>>> > > >>>>>>> > > Here is my configure command: >>>>>>> > > >>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>> --download-hypre >>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>> --download-metis >>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>> --with-fc=mpif90 >>>>>>> > > >>>>>>> > > which errors with: >>>>>>> > > >>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>> configure.log for >>>>>>> > > details): >>>>>>> > > >>>>>>> > > >>>>>>> --------------------------------------------------------------------------------------------- >>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14 >>>>>>> > > -Xcompiler -fPIC >>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>> > > arch=compute_80,code=sm_80" >>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > The same configure command works when I use the Intel MPI and I >>>>>>> can build >>>>>>> > > with CUDA. The full config.log file is attached. Please let me >>>>>>> know if you >>>>>>> > > need any other information. I appreciate your help with this. >>>>>>> > > >>>>>>> > >>>>>>> > The proximate error is >>>>>>> > >>>>>>> > Executing: nvcc -c -o >>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>> -std=c++14 >>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>> -gencode >>>>>>> > arch=compute_80,code=sm_80 >>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>> > conftest.cu >>>>>>> >>>>>>> > stdout: >>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>> than one >>>>>>> > instance of overloaded function >>>>>>> "__nv_associate_access_property_impl" has >>>>>>> > "C" linkage >>>>>>> > 1 error detected in the compilation of >>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>> >>>>>>> ". >>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>> > stderr: >>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>> than one >>>>>>> > instance of overloaded function >>>>>>> "__nv_associate_access_property_impl" has >>>>>>> > "C" linkage >>>>>>> > >>>>>>> > 1 error detected in the compilation of >>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>> > >>>>>>> > This looks like screwed up headers to me, but I will let someone >>>>>>> that >>>>>>> > understands CUDA compilation reply. >>>>>>> > >>>>>>> > Thanks, >>>>>>> > >>>>>>> > Matt >>>>>>> > >>>>>>> > Thanks, >>>>>>> > > Sreeram >>>>>>> > > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From srvenkat at utexas.edu Wed Apr 17 08:26:47 2024 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Wed, 17 Apr 2024 08:26:47 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: Yes, I saw this paper https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!dHsBib5l9Muy07HNhTdWzjZYUUlhkMkHrO7blcUjZQbwvChOEe0pDb5zyW-3qjEF78R3JHlcfjthtoxJY5VolUpbQw$ that mentioned it, and I heard in Barry's talk at SIAM PP this year about the need for stream-aware MPI, so I was wondering if NCCL would be used in PETSc to do GPU-GPU communication. On Wed, Apr 17, 2024, 7:58?AM Junchao Zhang wrote: > > > > > On Wed, Apr 17, 2024 at 7:51?AM Sreeram R Venkat > wrote: > >> Do you know if there are plans for NCCL support in PETSc? >> > What is your need? Do you mean using NCCL for the MPI communication? > > >> >> On Tue, Apr 16, 2024, 10:41?PM Junchao Zhang >> wrote: >> >>> Glad to hear you found a way. Did you use Frontera at TACC? If yes, I >>> could have a try. >>> >>> --Junchao Zhang >>> >>> >>> On Tue, Apr 16, 2024 at 8:35?PM Sreeram R Venkat >>> wrote: >>> >>>> I finally figured out a way to make it work. I had to build PETSc and >>>> my application using the (non GPU-aware) Intel MPI. Then, before running, I >>>> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only >>>> way I've >>>> ZjQcmQRYFpfptBannerStart >>>> This Message Is From an External Sender >>>> This message came from outside your organization. >>>> >>>> ZjQcmQRYFpfptBannerEnd >>>> I finally figured out a way to make it work. I had to build PETSc and >>>> my application using the (non GPU-aware) Intel MPI. Then, before running, I >>>> switch to the MVAPICH2-GDR. >>>> I'm not sure why that works, but it's the only way I've found to >>>> compile and run successfully without throwing any errors about not having a >>>> GPU-aware MPI. >>>> >>>> >>>> >>>> On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: >>>> >>>>> You may need to set some env variables. This can be system specific so >>>>> you might want to look at docs or ask TACC how to run with GPU-aware MPI. >>>>> >>>>> Mark >>>>> >>>>> On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat >>>>> wrote: >>>>> >>>>>> Actually, when I compile my program with this build of PETSc and run, >>>>>> I still get the error: >>>>>> >>>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is >>>>>> not GPU-aware. For better performance, please use a GPU-aware MPI. >>>>>> >>>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>>> >>>>>> Is there anything else I need to do? >>>>>> >>>>>> Thanks, >>>>>> Sreeram >>>>>> >>>>>> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat >>>>>> wrote: >>>>>> >>>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>>>>> >>>>>>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay >>>>>>> wrote: >>>>>>> >>>>>>>> Executing: mpicc -show >>>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>>>>>> >>>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>>> >>>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>>> this build. >>>>>>>> >>>>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>>>>> >>>>>>>> Satish >>>>>>>> >>>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>>> >>>>>>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat < >>>>>>>> srvenkat at utexas.edu> wrote: >>>>>>>> > >>>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>>> MVAPICH2-GDR. >>>>>>>> > > >>>>>>>> > > Here is my configure command: >>>>>>>> > > >>>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>>> --download-hypre >>>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>>> --download-metis >>>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>>> --with-fc=mpif90 >>>>>>>> > > >>>>>>>> > > which errors with: >>>>>>>> > > >>>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>>> configure.log for >>>>>>>> > > details): >>>>>>>> > > >>>>>>>> > > >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14 >>>>>>>> > > -Xcompiler -fPIC >>>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>>> > > arch=compute_80,code=sm_80" >>>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > The same configure command works when I use the Intel MPI and I >>>>>>>> can build >>>>>>>> > > with CUDA. The full config.log file is attached. Please let me >>>>>>>> know if you >>>>>>>> > > need any other information. I appreciate your help with this. >>>>>>>> > > >>>>>>>> > >>>>>>>> > The proximate error is >>>>>>>> > >>>>>>>> > Executing: nvcc -c -o >>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>>> -std=c++14 >>>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>>> -gencode >>>>>>>> > arch=compute_80,code=sm_80 >>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>>> > conftest.cu >>>>>>>> >>>>>>>> > stdout: >>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>> than one >>>>>>>> > instance of overloaded function >>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>> > "C" linkage >>>>>>>> > 1 error detected in the compilation of >>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>>> >>>>>>>> ". >>>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>>> > stderr: >>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>> than one >>>>>>>> > instance of overloaded function >>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>> > "C" linkage >>>>>>>> > >>>>>>>> > 1 error detected in the compilation of >>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>>> > >>>>>>>> > This looks like screwed up headers to me, but I will let someone >>>>>>>> that >>>>>>>> > understands CUDA compilation reply. >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > >>>>>>>> > Matt >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > > Sreeram >>>>>>>> > > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Apr 17 09:17:22 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 17 Apr 2024 09:17:22 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: I looked at it before and checked again, and still see https://urldefense.us/v3/__https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html*inter-gpu-communication-with-cuda-aware-mpi__;Iw!!G_uCfscf7eWS!aoebw25LunX6PlUYyVIti_VNLojOyz8MwWS3U-t5NHMMb4GnFUFV-nGq1LbjnigF72oKfDjmLXk9vUwkcYZpFTq0PgLl$ > Using both MPI and NCCL to perform transfers between the same sets of CUDA devices concurrently is therefore not guaranteed to be safe. I was scared by it. It means we have to replace all MPI device communications (what if they are from a third-party library?) with NCCL. --Junchao Zhang On Wed, Apr 17, 2024 at 8:27?AM Sreeram R Venkat wrote: > Yes, I saw this paper > https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!aoebw25LunX6PlUYyVIti_VNLojOyz8MwWS3U-t5NHMMb4GnFUFV-nGq1LbjnigF72oKfDjmLXk9vUwkcYZpFWw9ViCb$ > that mentioned it, and I heard in Barry's talk at SIAM PP this year about > the need for stream-aware MPI, so I was wondering if NCCL would be used in > PETSc to do GPU-GPU communication. > > On Wed, Apr 17, 2024, 7:58?AM Junchao Zhang > wrote: > >> >> >> >> >> On Wed, Apr 17, 2024 at 7:51?AM Sreeram R Venkat >> wrote: >> >>> Do you know if there are plans for NCCL support in PETSc? >>> >> What is your need? Do you mean using NCCL for the MPI communication? >> >> >>> >>> On Tue, Apr 16, 2024, 10:41?PM Junchao Zhang >>> wrote: >>> >>>> Glad to hear you found a way. Did you use Frontera at TACC? If yes, >>>> I could have a try. >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Tue, Apr 16, 2024 at 8:35?PM Sreeram R Venkat >>>> wrote: >>>> >>>>> I finally figured out a way to make it work. I had to build PETSc and >>>>> my application using the (non GPU-aware) Intel MPI. Then, before running, I >>>>> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only >>>>> way I've >>>>> ZjQcmQRYFpfptBannerStart >>>>> This Message Is From an External Sender >>>>> This message came from outside your organization. >>>>> >>>>> ZjQcmQRYFpfptBannerEnd >>>>> I finally figured out a way to make it work. I had to build PETSc and >>>>> my application using the (non GPU-aware) Intel MPI. Then, before running, I >>>>> switch to the MVAPICH2-GDR. >>>>> I'm not sure why that works, but it's the only way I've found to >>>>> compile and run successfully without throwing any errors about not having a >>>>> GPU-aware MPI. >>>>> >>>>> >>>>> >>>>> On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: >>>>> >>>>>> You may need to set some env variables. This can be system specific >>>>>> so you might want to look at docs or ask TACC how to run with GPU-aware MPI. >>>>>> >>>>>> Mark >>>>>> >>>>>> On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat >>>>>> wrote: >>>>>> >>>>>>> Actually, when I compile my program with this build of PETSc and >>>>>>> run, I still get the error: >>>>>>> >>>>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is >>>>>>> not GPU-aware. For better performance, please use a GPU-aware MPI. >>>>>>> >>>>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>>>> >>>>>>> Is there anything else I need to do? >>>>>>> >>>>>>> Thanks, >>>>>>> Sreeram >>>>>>> >>>>>>> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat >>>>>>> wrote: >>>>>>> >>>>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>>>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>>>>>> >>>>>>>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Executing: mpicc -show >>>>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>>>>>>> >>>>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>>>> >>>>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>>>> this build. >>>>>>>>> >>>>>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>>>>>> >>>>>>>>> Satish >>>>>>>>> >>>>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>>>> >>>>>>>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat < >>>>>>>>> srvenkat at utexas.edu> wrote: >>>>>>>>> > >>>>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>>>> MVAPICH2-GDR. >>>>>>>>> > > >>>>>>>>> > > Here is my configure command: >>>>>>>>> > > >>>>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>>>> --download-hypre >>>>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>>>> --download-metis >>>>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>>>> --with-fc=mpif90 >>>>>>>>> > > >>>>>>>>> > > which errors with: >>>>>>>>> > > >>>>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>>>> configure.log for >>>>>>>>> > > details): >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ >>>>>>>>> -std=c++14 >>>>>>>>> > > -Xcompiler -fPIC >>>>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>>>> > > arch=compute_80,code=sm_80" >>>>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > The same configure command works when I use the Intel MPI and >>>>>>>>> I can build >>>>>>>>> > > with CUDA. The full config.log file is attached. Please let me >>>>>>>>> know if you >>>>>>>>> > > need any other information. I appreciate your help with this. >>>>>>>>> > > >>>>>>>>> > >>>>>>>>> > The proximate error is >>>>>>>>> > >>>>>>>>> > Executing: nvcc -c -o >>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>>>> -std=c++14 >>>>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>>>> -gencode >>>>>>>>> > arch=compute_80,code=sm_80 >>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>>>> > conftest.cu >>>>>>>>> >>>>>>>>> > stdout: >>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>> than one >>>>>>>>> > instance of overloaded function >>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>> > "C" linkage >>>>>>>>> > 1 error detected in the compilation of >>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>>>> >>>>>>>>> ". >>>>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>>>> > stderr: >>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>> than one >>>>>>>>> > instance of overloaded function >>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>> > "C" linkage >>>>>>>>> > >>>>>>>>> > 1 error detected in the compilation of >>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>>>> > >>>>>>>>> > This looks like screwed up headers to me, but I will let someone >>>>>>>>> that >>>>>>>>> > understands CUDA compilation reply. >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > >>>>>>>>> > Matt >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > > Sreeram >>>>>>>>> > > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> >>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Thu Apr 18 07:22:51 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Thu, 18 Apr 2024 12:22:51 +0000 Subject: [petsc-users] Periodic boundary conditions using swarm Message-ID: <2E956128-D31A-4D45-86B7-F1DC8F7610E7@us.es> ? Dear all,? I am working on the implementation of periodic bcc using a discretisation (PIC-style). I am working with a test case which consists on solving the advection of a set of particles inside of a box (DMDA mesh) with periodic bcc on the x axis. My implementation updates the position of each particle with a velocity field, afterwards I check if the particle is inside, or not, the supercell (periodic box). If not, I correct the position using bcc conditions. Once this step is done, I call Dmswarmmigrate. It works in serial, but crashes in parallel with MPI (see attached nohup file). I have checked some of the Dmswarmmigrate examples, and they looks similar to my implementation. However they do not use periodic bcc. I am missing any step in addition to Dmswarmmigrate? Best regards Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nohup.out Type: application/octet-stream Size: 7966 bytes Desc: nohup.out URL: From mfadams at lbl.gov Thu Apr 18 15:30:13 2024 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 18 Apr 2024 16:30:13 -0400 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: References: Message-ID: Yikes, it looks like we have been off the list this whole time. I am not the only PETSC developer nor the only person that knows about PETSc! These folks have some strange behavior with GAMG going from 1 to 2 cores, using lots of memory, but one question that they have, that I don't understand either is this: >> Yea, my interpretation of these methods is also that " PetscMemoryGetMaximumUsage" should be >= "PetscMallocGetMaximumUsage". >> But you are seeing the opposite. We are using PETSc main and have found a case where memory consumption explodes in parallel. Also, we see a non-negligible difference between PetscMemoryGetMaximumUsage() and PetscMallocGetMaximumUsage(). Running in serial through /usr/bin/time, the max. resident set size matches the PetscMallocGetMaximumUsage() result. I would have expected it to match PetscMemoryGetMaximumUsage() instead. PetscMemoryGetMaximumUsage PetscMallocGetMaximumUsage Time Serial + Option 1 4.8 GB 7.4 GB 112 sec 2 core + Option1 15.2 GB 45.5 GB 150 sec Serial + Option 2 3.1 GB 3.8 GB 167 sec 2 core + Option2 13.1 GB 17.4 GB 89 sec Serial + Option 3 4.7GB 5.2GB 693 sec 2 core + Option 3 23.2 GB 26.4 GB 411 sec On Thu, Apr 18, 2024 at 4:13?PM Mark Adams wrote: > The next thing you might try is not using the null space argument. > Hypre does not use it, but GAMG does. > You could also run with -malloc_view to see some info on mallocs. It is > probably in the Mat objects. > You can also run with "-info" and grep on GAMG in the output and send that. > > Mark > > On Thu, Apr 18, 2024 at 12:03?PM Ashish Patel > wrote: > >> Hi Mark, >> >> Thanks for your response and suggestion. With hypre both memory and time >> looks good, here is the data for that >> >> PetscMemoryGetMaximumUsage >> PetscMallocGetMaximumUsage >> Time >> Serial + Option 4 >> 5.55 GB >> 5.17 GB >> 15.7 sec >> 2 core + Option 4 >> 5.85 GB >> 4.69 GB >> 21.9 sec >> >> Option 4 >> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type hypre >> -pc_hypre_boomeramg_strong_threshold 0.9 -ksp_view -log_view >> -log_view_memory -info :pc >> >> I am also attaching a standalone program to reproduce these options and >> the link to matrix, rhs and near null spaces (serial.tar 2.xz >> >> ) if you would like to try as well. Please let me know if you have >> trouble accessing the link. >> >> Ashish >> ------------------------------ >> *From:* Mark Adams >> *Sent:* Wednesday, April 17, 2024 7:52 PM >> *To:* Jeremy Theler (External) >> *Cc:* Ashish Patel ; Scott McClennan < >> scott.mcclennan at ansys.com> >> *Subject:* Re: About recent changes in GAMG >> >> >> *[External Sender]* >> >> >> On Wed, Apr 17, 2024 at 7:20?AM Jeremy Theler (External) < >> jeremy.theler-ext at ansys.com> wrote: >> >> Hey Mark. Long time no see! How are thing going over there? >> >> We are using PETSc main and have found a case where memory consumption >> explodes in parallel. >> Also, we see a non-negligible difference between >> PetscMemoryGetMaximumUsage() and PetscMallocGetMaximumUsage(). >> Running in serial through /usr/bin/time, the max. resident set size >> matches the PetscMallocGetMaximumUsage() result. >> I would have expected it to match PetscMemoryGetMaximumUsage() instead. >> >> >> Yea, my interpretation of these methods is also that "Memory" should be >> >= "Malloc". But you are seeing the opposite. >> >> I don't have any idea what is going on with your big memory penalty going >> from 1 to 2 cores on this test, but the first thing to do is try other >> solvers and see how that behaves. Hypre in particular would be a good thing >> to try because it is a similar algorithm. >> >> Mark >> >> >> >> The matrix size is around 1 million. We can share it with you if you >> want, along with the RHS and the 6 near nullspace vectors and a modified >> ex1.c which will read these files and show the following behavior. >> >> Observations using latest main for elastic matrix with a block size of 3 >> (after removing bonded glue-like DOFs with direct elimination) and near >> null space provided >> >> - Big memory penalty going from serial to parallel (2 core) >> - Big difference between PetscMemoryGetMaximumUsage and >> PetscMallocGetMaximumUsage, why? >> - The memory penalty decreases with -pc_gamg_aggressive_square_graph false >> (option 2) >> - The difference between PetscMemoryGetMaximumUsage and >> PetscMallocGetMaximumUsage reduces when -pc_gamg_threshold is >> increased from 0 to 0.01 (option 3), the solve time increase a lot though. >> >> >> >> >> >> PetscMemoryGetMaximumUsage >> PetscMallocGetMaximumUsage >> Time >> Serial + Option 1 >> 4.8 GB >> 7.4 GB >> 112 sec >> 2 core + Option1 >> 15.2 GB >> 45.5 GB >> 150 sec >> Serial + Option 2 >> 3.1 GB >> 3.8 GB >> 167 sec >> 2 core + Option2 >> 13.1 GB >> 17.4 GB >> 89 sec >> Serial + Option 3 >> 4.7GB >> 5.2GB >> 693 sec >> 2 core + Option 3 >> 23.2 GB >> 26.4 GB >> 411 sec >> >> Option 1 >> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold 0.0 -info :pc >> >> Option 2 >> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >> -pc_gamg_aggressive_square_graph *false* -pc_gamg_threshold 0.0 -info :pc >> >> Option 3 >> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold *0.01* -info :pc >> ------------------------------ >> *From:* Mark Adams >> *Sent:* Tuesday, November 14, 2023 11:28 AM >> *To:* Jeremy Theler (External) >> *Cc:* Ashish Patel >> *Subject:* Re: About recent changes in GAMG >> >> >> *[External Sender]* >> Sounds good, >> >> I think the not-square graph "aggressive" coarsening is only issue that I >> see and you can fix this by using: >> >> -mat_coarsen_type mis >> >> Aside, '-pc_gamg_aggressive_square_graph' should do it also, and you can >> use both and they will be ignored in earlier versions. >> >> If you see a difference then the first thing to do is run with '-info >> :pc' and send that to me (you can grep on 'GAMG' and send that if you like >> to reduce the data). >> >> Thanks, >> Mark >> >> >> On Tue, Nov 14, 2023 at 8:49?AM Jeremy Theler (External) < >> jeremy.theler-ext at ansys.com> wrote: >> >> Hi Mark. >> Thanks for reaching out. For now, we are going to stick to 3.19 for our >> production code because the changes in 3.20 impact in our tests in >> different ways (some of them perform better, some perform worse). >> I now switched to another task about investigating structural elements in >> DMplex. >> I'll go back to analyzing the new changes in GAMG in a couple of weeks so >> we can then see if we upgrade to 3.20 or we wait until 3.21. >> >> Thanks for your work and your kindness. >> -- >> jeremy >> ------------------------------ >> *From:* Mark Adams >> *Sent:* Tuesday, November 14, 2023 9:35 AM >> *To:* Jeremy Theler (External) >> *Cc:* Ashish Patel >> *Subject:* Re: About recent changes in GAMG >> >> >> *[External Sender]* >> Hi Jeremy, >> >> Just following up. >> I appreciate your digging into performance regressions in GAMG. >> AMG is really a pain sometimes and we want GAMG to be solid, at least for >> mainstream options, and your efforts are appreciated. >> So feel free to start this discussion up. >> >> Thanks, >> Mark >> >> On Wed, Oct 25, 2023 at 9:52?PM Jeremy Theler (External) < >> jeremy.theler-ext at ansys.com> wrote: >> >> Dear Mark >> >> Thanks for the follow up and sorry for the delay. >> I'm taking some days off. I'll be back to full throttle next week so can >> continue the discussion about these changes in GAMG. >> >> Regards, >> Jeremy >> >> ------------------------------ >> *From:* Mark Adams >> *Sent:* Wednesday, October 18, 2023 9:15 AM >> *To:* Jeremy Theler (External) ; PETSc >> users list >> *Cc:* Ashish Patel >> *Subject:* Re: About recent changes in GAMG >> >> >> *[External Sender]* >> Hi Jeremy, >> >> I hope you don't mind putting this on the list (w/o data), but this is >> documentation and you are the second user that found regressions. >> Sorry for the churn. >> >> There is a lot here so we can iterate, but here is a pass at your >> questions. >> >> *** Using MIS-2 instead of square graph was motivated by setup >> cost/performance but on GPUs with some recent fixes in Kokkos (in a branch) >> square graph seems OK. >> My experience was that square graph is better in terms of quality and we >> have a power user, like you all, that found this also. >> So I switched the default back to square graph. >> >> Interesting that you found that MIS-2 (new method) could be faster, but >> it might be because the two methods coarsen at different rates and that can >> make a big difference. >> (the way to test would be to adjust parameters to get similar coarsen >> rates, but I digress) >> It's hard to understand the differences between these two methods in >> terms of aggregate quality so we need to just experiment and have options. >> >> *** As far as your thermal problem. There was a complaint that the eigen >> estimates for chebyshev smoother were not recomputed for nonlinear problems >> and I added an option to do that and turned it on by default: >> Use '-pc_gamg_recompute_esteig false' to get back to the original. >> (I should have turned it off by default) >> >> Now, if your problem is symmetric and you use CG to compute the eigen >> estimates there should be no difference. >> If you use CG to compute the eigen estimates in GAMG (and have GAMG give >> them to cheby, the default) that when you recompute the eigen estimates the >> cheby eigen estimator is used and that will use gmres by default unless you >> set the SPD property in your matrix. >> So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set >> '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and -options_left) >> CG is a much better estimator for SPD. >> >> And I found that the cheby eigen estimator uses an LAPACK *eigen* method >> to compute the eigen bounds and GAMG uses a *singular value* method. >> The two give very different results on the lid driven cavity test (ex19). >> eigen is lower, which is safer but not optimal if it is too low. >> I have a branch to have cheby use the singular value method, but I don't >> plan on merging it (enough churn and I don't understand these differences). >> >> *** '-pc_gamg_low_memory_threshold_filter false' recovers the old >> filtering method. >> This is the default now because there is a bug in the (new) low memory >> filter. >> This bug is very rare and catastrophic. >> We are working on it and will turn it on by default when it's fixed. >> This does not affect the semantics of the solver, just work and memory >> complexity. >> >> *** As far as tet4 vs tet10, I would guess that tet4 wants more >> aggressive coarsening. >> The default is to do aggressive on one (1) level. >> You might want more levels for tet4. >> And the new MIS-k coarsening can use any k (default is 2) wth >> '-mat_coarsen_misk_distance k' (eg, k=3) >> I have not added hooks to have a more complex schedule to specify the >> method on each level. >> >> Thanks, >> Mark >> >> On Tue, Oct 17, 2023 at 9:33?PM Jeremy Theler (External) < >> jeremy.theler-ext at ansys.com> wrote: >> >> Hey Mark >> >> Regarding the changes in the coarsening algorithm in 3.20 with respect to >> 3.19 in general we see that for some problems the MIS strategy gives and >> overall performance which is slightly better and for some others it is >> slightly worse than the "baseline" from 3.19. >> We also saw that current main has switched back to the old square >> coarsening algorithm by default, which again, in some cases is better and >> in others is worse than 3.19 without any extra command-line option. >> >> Now what seems weird to us is that we have a test case which is a heat >> conduction problem with radiation boundary conditions (so it is non linear) >> using tet10 and we see >> >> 1. that in parallel v3.20 is way worse than v3.19, although the >> memory usage is similar >> 2. that petsc main (with no extra flags, just the defaults) recover >> the 3.19 performance but memory usage is significantly larger >> >> >> I tried using the -pc_gamg_low_memory_threshold_filter flag and the >> results were the same. >> >> Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI >> ranks. >> Is there any explanation about these two points we are seeing? >> Another weird finding is that if we use tet4 instead of tet10, v3.20 is >> only 10% slower than the other two and main does not need more memory than >> the other two. >> >> BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and >> main should you be interested. >> >> Let me know if it is better to move this discussion into the PETSc >> mailing list. >> >> Regards, >> jeremy theler >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Apr 18 16:16:56 2024 From: jed at jedbrown.org (Jed Brown) Date: Thu, 18 Apr 2024 15:16:56 -0600 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: References: Message-ID: <878r1aweuv.fsf@jedbrown.org> An HTML attachment was scrubbed... URL: From srvenkat at utexas.edu Fri Apr 19 11:54:57 2024 From: srvenkat at utexas.edu (Sreeram R Venkat) Date: Fri, 19 Apr 2024 11:54:57 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: I talked to the MVAPICH people, and they told me to try adding /path/to/mvapich2-gdr/lib64/libmpi.so to LD_PRELOAD (apparently, they've had this issue before). This seemed to do the trick; I can build everything with MVAPICH2-GDR and run with it now. Not sure if this is something you want to add to the docs. Thanks, Sreeram On Wed, Apr 17, 2024 at 9:17?AM Junchao Zhang wrote: > I looked at it before and checked again, and still see > https://urldefense.us/v3/__https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html*inter-gpu-communication-with-cuda-aware-mpi__;Iw!!G_uCfscf7eWS!bE42LoNqiOoD5Yu05BdtZqAHbFHxkuFy6S8ljr09QRqymgFnne-nbxx-xywoOtRzBGA3fRvcsOyyVgNDLWPRbI84MA$ > > Using both MPI and NCCL to perform transfers between the same sets of > CUDA devices concurrently is therefore not guaranteed to be safe. > > I was scared by it. It means we have to replace all MPI device > communications (what if they are from a third-party library?) with NCCL. > > --Junchao Zhang > > > On Wed, Apr 17, 2024 at 8:27?AM Sreeram R Venkat > wrote: > >> Yes, I saw this paper >> https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!bE42LoNqiOoD5Yu05BdtZqAHbFHxkuFy6S8ljr09QRqymgFnne-nbxx-xywoOtRzBGA3fRvcsOyyVgNDLWMNhkqWSA$ >> that mentioned it, and I heard in Barry's talk at SIAM PP this year about >> the need for stream-aware MPI, so I was wondering if NCCL would be used in >> PETSc to do GPU-GPU communication. >> >> On Wed, Apr 17, 2024, 7:58?AM Junchao Zhang >> wrote: >> >>> >>> >>> >>> >>> On Wed, Apr 17, 2024 at 7:51?AM Sreeram R Venkat >>> wrote: >>> >>>> Do you know if there are plans for NCCL support in PETSc? >>>> >>> What is your need? Do you mean using NCCL for the MPI communication? >>> >>> >>>> >>>> On Tue, Apr 16, 2024, 10:41?PM Junchao Zhang >>>> wrote: >>>> >>>>> Glad to hear you found a way. Did you use Frontera at TACC? If yes, >>>>> I could have a try. >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Apr 16, 2024 at 8:35?PM Sreeram R Venkat >>>>> wrote: >>>>> >>>>>> I finally figured out a way to make it work. I had to build PETSc and >>>>>> my application using the (non GPU-aware) Intel MPI. Then, before running, I >>>>>> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only >>>>>> way I've >>>>>> ZjQcmQRYFpfptBannerStart >>>>>> This Message Is From an External Sender >>>>>> This message came from outside your organization. >>>>>> >>>>>> ZjQcmQRYFpfptBannerEnd >>>>>> I finally figured out a way to make it work. I had to build PETSc and >>>>>> my application using the (non GPU-aware) Intel MPI. Then, before running, I >>>>>> switch to the MVAPICH2-GDR. >>>>>> I'm not sure why that works, but it's the only way I've found to >>>>>> compile and run successfully without throwing any errors about not having a >>>>>> GPU-aware MPI. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: >>>>>> >>>>>>> You may need to set some env variables. This can be system specific >>>>>>> so you might want to look at docs or ask TACC how to run with GPU-aware MPI. >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat >>>>>>> wrote: >>>>>>> >>>>>>>> Actually, when I compile my program with this build of PETSc and >>>>>>>> run, I still get the error: >>>>>>>> >>>>>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is >>>>>>>> not GPU-aware. For better performance, please use a GPU-aware MPI. >>>>>>>> >>>>>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>>>>> >>>>>>>> Is there anything else I need to do? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sreeram >>>>>>>> >>>>>>>> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat < >>>>>>>> srvenkat at utexas.edu> wrote: >>>>>>>> >>>>>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>>>>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>>>>>>> >>>>>>>>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Executing: mpicc -show >>>>>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>>>>>>>> >>>>>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>>>>> >>>>>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>>>>> this build. >>>>>>>>>> >>>>>>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>>>>>>> >>>>>>>>>> Satish >>>>>>>>>> >>>>>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>>>>> >>>>>>>>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat < >>>>>>>>>> srvenkat at utexas.edu> wrote: >>>>>>>>>> > >>>>>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>>>>> MVAPICH2-GDR. >>>>>>>>>> > > >>>>>>>>>> > > Here is my configure command: >>>>>>>>>> > > >>>>>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>>>>> --download-hypre >>>>>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>>>>> --download-metis >>>>>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>>>>> --with-fc=mpif90 >>>>>>>>>> > > >>>>>>>>>> > > which errors with: >>>>>>>>>> > > >>>>>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>>>>> configure.log for >>>>>>>>>> > > details): >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ >>>>>>>>>> -std=c++14 >>>>>>>>>> > > -Xcompiler -fPIC >>>>>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>>>>> > > arch=compute_80,code=sm_80" >>>>>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > The same configure command works when I use the Intel MPI and >>>>>>>>>> I can build >>>>>>>>>> > > with CUDA. The full config.log file is attached. Please let >>>>>>>>>> me know if you >>>>>>>>>> > > need any other information. I appreciate your help with this. >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> > The proximate error is >>>>>>>>>> > >>>>>>>>>> > Executing: nvcc -c -o >>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>>>>> -std=c++14 >>>>>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>>>>> -gencode >>>>>>>>>> > arch=compute_80,code=sm_80 >>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>>>>> > conftest.cu >>>>>>>>>> >>>>>>>>>> > stdout: >>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>> than one >>>>>>>>>> > instance of overloaded function >>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>> > "C" linkage >>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>>>>> >>>>>>>>>> ". >>>>>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>>>>> > stderr: >>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>> than one >>>>>>>>>> > instance of overloaded function >>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>> > "C" linkage >>>>>>>>>> > >>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>>>>> > >>>>>>>>>> > This looks like screwed up headers to me, but I will let >>>>>>>>>> someone that >>>>>>>>>> > understands CUDA compilation reply. >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > >>>>>>>>>> > Matt >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > > Sreeram >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> >>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Apr 19 12:48:50 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 19 Apr 2024 12:48:50 -0500 Subject: [petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR In-Reply-To: References: <1da79702-c1eb-0ad8-6efc-64580e02bd07@mcs.anl.gov> Message-ID: Thanks for the trick. We can prepare the example script for Lonestar6 and mention it. --Junchao Zhang On Fri, Apr 19, 2024 at 11:55?AM Sreeram R Venkat wrote: > I talked to the MVAPICH people, and they told me to try adding > /path/to/mvapich2-gdr/lib64/libmpi.so to LD_PRELOAD (apparently, they've > had this issue before). This seemed to do the trick; I can build everything > with MVAPICH2-GDR and run with it now. Not sure if this is something you > want to add to the docs. > > Thanks, > Sreeram > > On Wed, Apr 17, 2024 at 9:17?AM Junchao Zhang > wrote: > >> I looked at it before and checked again, and still see >> https://urldefense.us/v3/__https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html*inter-gpu-communication-with-cuda-aware-mpi__;Iw!!G_uCfscf7eWS!a7KaPIMnK3W6fKk-LnqLXX2RuRqPkLf7VqOFMTbTer2ssQFCasDyKoFrz3cZDwHMUxFHFHYcHAp3JQLue4dkR3JyE_IH$ >> > Using both MPI and NCCL to perform transfers between the same sets of >> CUDA devices concurrently is therefore not guaranteed to be safe. >> >> I was scared by it. It means we have to replace all MPI device >> communications (what if they are from a third-party library?) with NCCL. >> >> --Junchao Zhang >> >> >> On Wed, Apr 17, 2024 at 8:27?AM Sreeram R Venkat >> wrote: >> >>> Yes, I saw this paper >>> https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!a7KaPIMnK3W6fKk-LnqLXX2RuRqPkLf7VqOFMTbTer2ssQFCasDyKoFrz3cZDwHMUxFHFHYcHAp3JQLue4dkRxedk29J$ >>> that mentioned it, and I heard in Barry's talk at SIAM PP this year >>> about the need for stream-aware MPI, so I was wondering if NCCL would be >>> used in PETSc to do GPU-GPU communication. >>> >>> On Wed, Apr 17, 2024, 7:58?AM Junchao Zhang >>> wrote: >>> >>>> >>>> >>>> >>>> >>>> On Wed, Apr 17, 2024 at 7:51?AM Sreeram R Venkat >>>> wrote: >>>> >>>>> Do you know if there are plans for NCCL support in PETSc? >>>>> >>>> What is your need? Do you mean using NCCL for the MPI communication? >>>> >>>> >>>>> >>>>> On Tue, Apr 16, 2024, 10:41?PM Junchao Zhang >>>>> wrote: >>>>> >>>>>> Glad to hear you found a way. Did you use Frontera at TACC? If >>>>>> yes, I could have a try. >>>>>> >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Tue, Apr 16, 2024 at 8:35?PM Sreeram R Venkat >>>>>> wrote: >>>>>> >>>>>>> I finally figured out a way to make it work. I had to build PETSc >>>>>>> and my application using the (non GPU-aware) Intel MPI. Then, before >>>>>>> running, I switch to the MVAPICH2-GDR. I'm not sure why that works, but >>>>>>> it's the only way I've >>>>>>> ZjQcmQRYFpfptBannerStart >>>>>>> This Message Is From an External Sender >>>>>>> This message came from outside your organization. >>>>>>> >>>>>>> ZjQcmQRYFpfptBannerEnd >>>>>>> I finally figured out a way to make it work. I had to build PETSc >>>>>>> and my application using the (non GPU-aware) Intel MPI. Then, before >>>>>>> running, I switch to the MVAPICH2-GDR. >>>>>>> I'm not sure why that works, but it's the only way I've found to >>>>>>> compile and run successfully without throwing any errors about not having a >>>>>>> GPU-aware MPI. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 8, 2023 at 5:30?PM Mark Adams wrote: >>>>>>> >>>>>>>> You may need to set some env variables. This can be system specific >>>>>>>> so you might want to look at docs or ask TACC how to run with GPU-aware MPI. >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> On Fri, Dec 8, 2023 at 5:17?PM Sreeram R Venkat < >>>>>>>> srvenkat at utexas.edu> wrote: >>>>>>>> >>>>>>>>> Actually, when I compile my program with this build of PETSc and >>>>>>>>> run, I still get the error: >>>>>>>>> >>>>>>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is >>>>>>>>> not GPU-aware. For better performance, please use a GPU-aware MPI. >>>>>>>>> >>>>>>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>>>>>> >>>>>>>>> Is there anything else I need to do? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sreeram >>>>>>>>> >>>>>>>>> On Fri, Dec 8, 2023 at 3:29?PM Sreeram R Venkat < >>>>>>>>> srvenkat at utexas.edu> wrote: >>>>>>>>> >>>>>>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The >>>>>>>>>> mvapich2-gdr module didn't require CUDA 11.4 as a dependency, so I was >>>>>>>>>> using 12.0 >>>>>>>>>> >>>>>>>>>> On Fri, Dec 8, 2023 at 1:15?PM Satish Balay >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Executing: mpicc -show >>>>>>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi >>>>>>>>>>> >>>>>>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>>>>>> >>>>>>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>>>>>> this build. >>>>>>>>>>> >>>>>>>>>>> Perhaps you need to use cuda-11.4 - with this install of >>>>>>>>>>> mvapich.. >>>>>>>>>>> >>>>>>>>>>> Satish >>>>>>>>>>> >>>>>>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>>>>>> >>>>>>>>>>> > On Fri, Dec 8, 2023 at 1:54?PM Sreeram R Venkat < >>>>>>>>>>> srvenkat at utexas.edu> wrote: >>>>>>>>>>> > >>>>>>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>>>>>> MVAPICH2-GDR. >>>>>>>>>>> > > >>>>>>>>>>> > > Here is my configure command: >>>>>>>>>>> > > >>>>>>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>>>>>> --download-hypre >>>>>>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>>>>>> --download-metis >>>>>>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>>>>>> --with-fc=mpif90 >>>>>>>>>>> > > >>>>>>>>>>> > > which errors with: >>>>>>>>>>> > > >>>>>>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>>>>>> configure.log for >>>>>>>>>>> > > details): >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ >>>>>>>>>>> -std=c++14 >>>>>>>>>>> > > -Xcompiler -fPIC >>>>>>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>>>>>> > > arch=compute_80,code=sm_80" >>>>>>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > The same configure command works when I use the Intel MPI >>>>>>>>>>> and I can build >>>>>>>>>>> > > with CUDA. The full config.log file is attached. Please let >>>>>>>>>>> me know if you >>>>>>>>>>> > > need any other information. I appreciate your help with this. >>>>>>>>>>> > > >>>>>>>>>>> > >>>>>>>>>>> > The proximate error is >>>>>>>>>>> > >>>>>>>>>>> > Executing: nvcc -c -o >>>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>>>>>> -std=c++14 >>>>>>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>>>>>> -gencode >>>>>>>>>>> > arch=compute_80,code=sm_80 >>>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>>>>>> > conftest.cu >>>>>>>>>>> >>>>>>>>>>> > stdout: >>>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>>> than one >>>>>>>>>>> > instance of overloaded function >>>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>>> > "C" linkage >>>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>>>>>> >>>>>>>>>>> ". >>>>>>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>>>>>> > stderr: >>>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>>> than one >>>>>>>>>>> > instance of overloaded function >>>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>>> > "C" linkage >>>>>>>>>>> > >>>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>>>>>> > >>>>>>>>>>> > This looks like screwed up headers to me, but I will let >>>>>>>>>>> someone that >>>>>>>>>>> > understands CUDA compilation reply. >>>>>>>>>>> > >>>>>>>>>>> > Thanks, >>>>>>>>>>> > >>>>>>>>>>> > Matt >>>>>>>>>>> > >>>>>>>>>>> > Thanks, >>>>>>>>>>> > > Sreeram >>>>>>>>>>> > > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashish.patel at ansys.com Fri Apr 19 13:46:46 2024 From: ashish.patel at ansys.com (Ashish Patel) Date: Fri, 19 Apr 2024 18:46:46 +0000 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: <878r1aweuv.fsf@jedbrown.org> References: <878r1aweuv.fsf@jedbrown.org> Message-ID: Hi Jed, VmRss is on a higher side and seems to match what PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. Mark, running without the near nullspace also gives similar results. I have attached the malloc_view and gamg info for serial and 2 core runs. Some of the standout functions on rank 0 for parallel run seems to be 5.3 GB MatSeqAIJSetPreallocation_SeqAIJ 7.7 GB MatStashSortCompress_Private 10.1 GB PetscMatStashSpaceGet 7.7 GB PetscSegBufferAlloc_Private malloc_view also says the following [0] Maximum memory PetscMalloc()ed 32387548912 maximum size of entire process 8270635008 which fits the PetscMallocGetMaximumUsage > PetscMemoryGetMaximumUsage output. Let me know if you need some other info. Thanks, Ashish? ________________________________ From: Jed Brown Sent: Thursday, April 18, 2024 2:16 PM To: Mark Adams ; Ashish Patel ; PETSc users list Cc: Scott McClennan Subject: Re: [petsc-users] About recent changes in GAMG [External Sender] Mark Adams writes: >>> Yea, my interpretation of these methods is also that " > PetscMemoryGetMaximumUsage" should be >= "PetscMallocGetMaximumUsage". >>> But you are seeing the opposite. > > > We are using PETSc main and have found a case where memory consumption > explodes in parallel. > Also, we see a non-negligible difference between PetscMemoryGetMaximumUsage() > and PetscMallocGetMaximumUsage(). > Running in serial through /usr/bin/time, the max. resident set size matches > the PetscMallocGetMaximumUsage() result. > I would have expected it to match PetscMemoryGetMaximumUsage() instead. PetscMemoryGetMaximumUsage uses procfs (if PETSC_USE_PROCFS_FOR_SIZE, which should be typical on Linux anyway) in PetscHeaderDestroy to update a static variable. If you haven't destroyed an object yet, its value will be nonsense. If your program is using huge pages, it might also be inaccurate (I don't know). You can look at /proc//statm to see what PETSc is reading (second field, which is number of pages in RSS). You can also look at the VmRss field in /proc//status, which reads in kB. See also the HugetlbPages field in /proc//status. https://urldefense.us/v3/__https://www.kernel.org/doc/Documentation/filesystems/proc.txt__;!!G_uCfscf7eWS!cjjzIqrR0_JSzQFZMrxX9GzpJEPHSN5oVeNexSd2AKNVhVFmsrJy-sKYRd0VFTzEk1LB727T1dFbrhKJ208CIzji9_k$ If your app is swapping, these will be inaccurate because swapped memory is not resident. We don't use the first field (VmSize) because there are reasons why programs sometimes map much more memory than they'll actually use, making such numbers irrelevant for most purposes. > > > PetscMemoryGetMaximumUsage > PetscMallocGetMaximumUsage > Time > Serial + Option 1 > 4.8 GB > 7.4 GB > 112 sec > 2 core + Option1 > 15.2 GB > 45.5 GB > 150 sec > Serial + Option 2 > 3.1 GB > 3.8 GB > 167 sec > 2 core + Option2 > 13.1 GB > 17.4 GB > 89 sec > Serial + Option 3 > 4.7GB > 5.2GB > 693 sec > 2 core + Option 3 > 23.2 GB > 26.4 GB > 411 sec > > > On Thu, Apr 18, 2024 at 4:13?PM Mark Adams wrote: > >> The next thing you might try is not using the null space argument. >> Hypre does not use it, but GAMG does. >> You could also run with -malloc_view to see some info on mallocs. It is >> probably in the Mat objects. >> You can also run with "-info" and grep on GAMG in the output and send that. >> >> Mark >> >> On Thu, Apr 18, 2024 at 12:03?PM Ashish Patel >> wrote: >> >>> Hi Mark, >>> >>> Thanks for your response and suggestion. With hypre both memory and time >>> looks good, here is the data for that >>> >>> PetscMemoryGetMaximumUsage >>> PetscMallocGetMaximumUsage >>> Time >>> Serial + Option 4 >>> 5.55 GB >>> 5.17 GB >>> 15.7 sec >>> 2 core + Option 4 >>> 5.85 GB >>> 4.69 GB >>> 21.9 sec >>> >>> Option 4 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type hypre >>> -pc_hypre_boomeramg_strong_threshold 0.9 -ksp_view -log_view >>> -log_view_memory -info :pc >>> >>> I am also attaching a standalone program to reproduce these options and >>> the link to matrix, rhs and near null spaces (serial.tar 2.xz >>> >>> ) if you would like to try as well. Please let me know if you have >>> trouble accessing the link. >>> >>> Ashish >>> ------------------------------ >>> *From:* Mark Adams >>> *Sent:* Wednesday, April 17, 2024 7:52 PM >>> *To:* Jeremy Theler (External) >>> *Cc:* Ashish Patel ; Scott McClennan < >>> scott.mcclennan at ansys.com> >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> >>> >>> On Wed, Apr 17, 2024 at 7:20?AM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Hey Mark. Long time no see! How are thing going over there? >>> >>> We are using PETSc main and have found a case where memory consumption >>> explodes in parallel. >>> Also, we see a non-negligible difference between >>> PetscMemoryGetMaximumUsage() and PetscMallocGetMaximumUsage(). >>> Running in serial through /usr/bin/time, the max. resident set size >>> matches the PetscMallocGetMaximumUsage() result. >>> I would have expected it to match PetscMemoryGetMaximumUsage() instead. >>> >>> >>> Yea, my interpretation of these methods is also that "Memory" should be >>> >= "Malloc". But you are seeing the opposite. >>> >>> I don't have any idea what is going on with your big memory penalty going >>> from 1 to 2 cores on this test, but the first thing to do is try other >>> solvers and see how that behaves. Hypre in particular would be a good thing >>> to try because it is a similar algorithm. >>> >>> Mark >>> >>> >>> >>> The matrix size is around 1 million. We can share it with you if you >>> want, along with the RHS and the 6 near nullspace vectors and a modified >>> ex1.c which will read these files and show the following behavior. >>> >>> Observations using latest main for elastic matrix with a block size of 3 >>> (after removing bonded glue-like DOFs with direct elimination) and near >>> null space provided >>> >>> - Big memory penalty going from serial to parallel (2 core) >>> - Big difference between PetscMemoryGetMaximumUsage and >>> PetscMallocGetMaximumUsage, why? >>> - The memory penalty decreases with -pc_gamg_aggressive_square_graph false >>> (option 2) >>> - The difference between PetscMemoryGetMaximumUsage and >>> PetscMallocGetMaximumUsage reduces when -pc_gamg_threshold is >>> increased from 0 to 0.01 (option 3), the solve time increase a lot though. >>> >>> >>> >>> >>> >>> PetscMemoryGetMaximumUsage >>> PetscMallocGetMaximumUsage >>> Time >>> Serial + Option 1 >>> 4.8 GB >>> 7.4 GB >>> 112 sec >>> 2 core + Option1 >>> 15.2 GB >>> 45.5 GB >>> 150 sec >>> Serial + Option 2 >>> 3.1 GB >>> 3.8 GB >>> 167 sec >>> 2 core + Option2 >>> 13.1 GB >>> 17.4 GB >>> 89 sec >>> Serial + Option 3 >>> 4.7GB >>> 5.2GB >>> 693 sec >>> 2 core + Option 3 >>> 23.2 GB >>> 26.4 GB >>> 411 sec >>> >>> Option 1 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold 0.0 -info :pc >>> >>> Option 2 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >>> -pc_gamg_aggressive_square_graph *false* -pc_gamg_threshold 0.0 -info :pc >>> >>> Option 3 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold *0.01* -info :pc >>> ------------------------------ >>> *From:* Mark Adams >>> *Sent:* Tuesday, November 14, 2023 11:28 AM >>> *To:* Jeremy Theler (External) >>> *Cc:* Ashish Patel >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> Sounds good, >>> >>> I think the not-square graph "aggressive" coarsening is only issue that I >>> see and you can fix this by using: >>> >>> -mat_coarsen_type mis >>> >>> Aside, '-pc_gamg_aggressive_square_graph' should do it also, and you can >>> use both and they will be ignored in earlier versions. >>> >>> If you see a difference then the first thing to do is run with '-info >>> :pc' and send that to me (you can grep on 'GAMG' and send that if you like >>> to reduce the data). >>> >>> Thanks, >>> Mark >>> >>> >>> On Tue, Nov 14, 2023 at 8:49?AM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Hi Mark. >>> Thanks for reaching out. For now, we are going to stick to 3.19 for our >>> production code because the changes in 3.20 impact in our tests in >>> different ways (some of them perform better, some perform worse). >>> I now switched to another task about investigating structural elements in >>> DMplex. >>> I'll go back to analyzing the new changes in GAMG in a couple of weeks so >>> we can then see if we upgrade to 3.20 or we wait until 3.21. >>> >>> Thanks for your work and your kindness. >>> -- >>> jeremy >>> ------------------------------ >>> *From:* Mark Adams >>> *Sent:* Tuesday, November 14, 2023 9:35 AM >>> *To:* Jeremy Theler (External) >>> *Cc:* Ashish Patel >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> Hi Jeremy, >>> >>> Just following up. >>> I appreciate your digging into performance regressions in GAMG. >>> AMG is really a pain sometimes and we want GAMG to be solid, at least for >>> mainstream options, and your efforts are appreciated. >>> So feel free to start this discussion up. >>> >>> Thanks, >>> Mark >>> >>> On Wed, Oct 25, 2023 at 9:52?PM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Dear Mark >>> >>> Thanks for the follow up and sorry for the delay. >>> I'm taking some days off. I'll be back to full throttle next week so can >>> continue the discussion about these changes in GAMG. >>> >>> Regards, >>> Jeremy >>> >>> ------------------------------ >>> *From:* Mark Adams >>> *Sent:* Wednesday, October 18, 2023 9:15 AM >>> *To:* Jeremy Theler (External) ; PETSc >>> users list >>> *Cc:* Ashish Patel >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> Hi Jeremy, >>> >>> I hope you don't mind putting this on the list (w/o data), but this is >>> documentation and you are the second user that found regressions. >>> Sorry for the churn. >>> >>> There is a lot here so we can iterate, but here is a pass at your >>> questions. >>> >>> *** Using MIS-2 instead of square graph was motivated by setup >>> cost/performance but on GPUs with some recent fixes in Kokkos (in a branch) >>> square graph seems OK. >>> My experience was that square graph is better in terms of quality and we >>> have a power user, like you all, that found this also. >>> So I switched the default back to square graph. >>> >>> Interesting that you found that MIS-2 (new method) could be faster, but >>> it might be because the two methods coarsen at different rates and that can >>> make a big difference. >>> (the way to test would be to adjust parameters to get similar coarsen >>> rates, but I digress) >>> It's hard to understand the differences between these two methods in >>> terms of aggregate quality so we need to just experiment and have options. >>> >>> *** As far as your thermal problem. There was a complaint that the eigen >>> estimates for chebyshev smoother were not recomputed for nonlinear problems >>> and I added an option to do that and turned it on by default: >>> Use '-pc_gamg_recompute_esteig false' to get back to the original. >>> (I should have turned it off by default) >>> >>> Now, if your problem is symmetric and you use CG to compute the eigen >>> estimates there should be no difference. >>> If you use CG to compute the eigen estimates in GAMG (and have GAMG give >>> them to cheby, the default) that when you recompute the eigen estimates the >>> cheby eigen estimator is used and that will use gmres by default unless you >>> set the SPD property in your matrix. >>> So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set >>> '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and -options_left) >>> CG is a much better estimator for SPD. >>> >>> And I found that the cheby eigen estimator uses an LAPACK *eigen* method >>> to compute the eigen bounds and GAMG uses a *singular value* method. >>> The two give very different results on the lid driven cavity test (ex19). >>> eigen is lower, which is safer but not optimal if it is too low. >>> I have a branch to have cheby use the singular value method, but I don't >>> plan on merging it (enough churn and I don't understand these differences). >>> >>> *** '-pc_gamg_low_memory_threshold_filter false' recovers the old >>> filtering method. >>> This is the default now because there is a bug in the (new) low memory >>> filter. >>> This bug is very rare and catastrophic. >>> We are working on it and will turn it on by default when it's fixed. >>> This does not affect the semantics of the solver, just work and memory >>> complexity. >>> >>> *** As far as tet4 vs tet10, I would guess that tet4 wants more >>> aggressive coarsening. >>> The default is to do aggressive on one (1) level. >>> You might want more levels for tet4. >>> And the new MIS-k coarsening can use any k (default is 2) wth >>> '-mat_coarsen_misk_distance k' (eg, k=3) >>> I have not added hooks to have a more complex schedule to specify the >>> method on each level. >>> >>> Thanks, >>> Mark >>> >>> On Tue, Oct 17, 2023 at 9:33?PM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Hey Mark >>> >>> Regarding the changes in the coarsening algorithm in 3.20 with respect to >>> 3.19 in general we see that for some problems the MIS strategy gives and >>> overall performance which is slightly better and for some others it is >>> slightly worse than the "baseline" from 3.19. >>> We also saw that current main has switched back to the old square >>> coarsening algorithm by default, which again, in some cases is better and >>> in others is worse than 3.19 without any extra command-line option. >>> >>> Now what seems weird to us is that we have a test case which is a heat >>> conduction problem with radiation boundary conditions (so it is non linear) >>> using tet10 and we see >>> >>> 1. that in parallel v3.20 is way worse than v3.19, although the >>> memory usage is similar >>> 2. that petsc main (with no extra flags, just the defaults) recover >>> the 3.19 performance but memory usage is significantly larger >>> >>> >>> I tried using the -pc_gamg_low_memory_threshold_filter flag and the >>> results were the same. >>> >>> Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI >>> ranks. >>> Is there any explanation about these two points we are seeing? >>> Another weird finding is that if we use tet4 instead of tet10, v3.20 is >>> only 10% slower than the other two and main does not need more memory than >>> the other two. >>> >>> BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and >>> main should you be interested. >>> >>> Let me know if it is better to move this discussion into the PETSc >>> mailing list. >>> >>> Regards, >>> jeremy theler >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: option1_2core.log Type: application/octet-stream Size: 42143 bytes Desc: option1_2core.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: option1_serial.log Type: application/octet-stream Size: 31875 bytes Desc: option1_serial.log URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ex1.c URL: From knepley at gmail.com Fri Apr 19 15:04:42 2024 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 19 Apr 2024 16:04:42 -0400 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: References: <878r1aweuv.fsf@jedbrown.org> Message-ID: On Fri, Apr 19, 2024 at 3:52?PM Ashish Patel wrote: > Hi Jed, VmRss is on a higher side and seems to match what > PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. Mark, > running without the near nullspace also gives similar results. I have > attached the malloc_view and gamg info > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Hi Jed, > VmRss is on a higher side and seems to match what > PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. > > Mark, running without the near nullspace also gives similar results. I > have attached the malloc_view and gamg info for serial and 2 core runs. > Some of the standout functions on rank 0 for parallel run seems to be > 5.3 GB MatSeqAIJSetPreallocation_SeqAIJ > 7.7 GB MatStashSortCompress_Private > 10.1 GB PetscMatStashSpaceGet > This is strange. We would expect the MatStash to be much smaller than the allocation, but it is larger. That suggests that you are sending a large number of off-process values. Is this by design? Thanks, Matt > 7.7 GB PetscSegBufferAlloc_Private > > malloc_view also says the following > [0] Maximum memory PetscMalloc()ed 32387548912 maximum size of entire > process 8270635008 > which fits the PetscMallocGetMaximumUsage > PetscMemoryGetMaximumUsage > output. > > Let me know if you need some other info. > > Thanks, > Ashish > > ------------------------------ > *From:* Jed Brown > *Sent:* Thursday, April 18, 2024 2:16 PM > *To:* Mark Adams ; Ashish Patel ; > PETSc users list > *Cc:* Scott McClennan > *Subject:* Re: [petsc-users] About recent changes in GAMG > > [External Sender] > > Mark Adams writes: > > >>> Yea, my interpretation of these methods is also that " > > PetscMemoryGetMaximumUsage" should be >= "PetscMallocGetMaximumUsage". > >>> But you are seeing the opposite. > > > > > > We are using PETSc main and have found a case where memory consumption > > explodes in parallel. > > Also, we see a non-negligible difference between > PetscMemoryGetMaximumUsage() > > and PetscMallocGetMaximumUsage(). > > Running in serial through /usr/bin/time, the max. resident set size > matches > > the PetscMallocGetMaximumUsage() result. > > I would have expected it to match PetscMemoryGetMaximumUsage() instead. > > PetscMemoryGetMaximumUsage uses procfs (if PETSC_USE_PROCFS_FOR_SIZE, > which should be typical on Linux anyway) in PetscHeaderDestroy to update a > static variable. If you haven't destroyed an object yet, its value will be > nonsense. > > If your program is using huge pages, it might also be inaccurate (I don't > know). You can look at /proc//statm to see what PETSc is reading > (second field, which is number of pages in RSS). You can also look at the > VmRss field in /proc//status, which reads in kB. See also the > HugetlbPages field in /proc//status. > > https://urldefense.us/v3/__https://www.kernel.org/doc/Documentation/filesystems/proc.txt__;!!G_uCfscf7eWS!ed35Hheva03XvvROCiYSMw0awizDuqiHG-IvZWEe-6j6XOY7z0eYVj_VFWQsUtaWNm-JLkMBQQT5wHbCEp3F$ > > > If your app is swapping, these will be inaccurate because swapped memory > is not resident. We don't use the first field (VmSize) because there are > reasons why programs sometimes map much more memory than they'll actually > use, making such numbers irrelevant for most purposes. > > > > > > > PetscMemoryGetMaximumUsage > > PetscMallocGetMaximumUsage > > Time > > Serial + Option 1 > > 4.8 GB > > 7.4 GB > > 112 sec > > 2 core + Option1 > > 15.2 GB > > 45.5 GB > > 150 sec > > Serial + Option 2 > > 3.1 GB > > 3.8 GB > > 167 sec > > 2 core + Option2 > > 13.1 GB > > 17.4 GB > > 89 sec > > Serial + Option 3 > > 4.7GB > > 5.2GB > > 693 sec > > 2 core + Option 3 > > 23.2 GB > > 26.4 GB > > 411 sec > > > > > > On Thu, Apr 18, 2024 at 4:13?PM Mark Adams wrote: > > > >> The next thing you might try is not using the null space argument. > >> Hypre does not use it, but GAMG does. > >> You could also run with -malloc_view to see some info on mallocs. It is > >> probably in the Mat objects. > >> You can also run with "-info" and grep on GAMG in the output and send > that. > >> > >> Mark > >> > >> On Thu, Apr 18, 2024 at 12:03?PM Ashish Patel > >> wrote: > >> > >>> Hi Mark, > >>> > >>> Thanks for your response and suggestion. With hypre both memory and > time > >>> looks good, here is the data for that > >>> > >>> PetscMemoryGetMaximumUsage > >>> PetscMallocGetMaximumUsage > >>> Time > >>> Serial + Option 4 > >>> 5.55 GB > >>> 5.17 GB > >>> 15.7 sec > >>> 2 core + Option 4 > >>> 5.85 GB > >>> 4.69 GB > >>> 21.9 sec > >>> > >>> Option 4 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type hypre > >>> -pc_hypre_boomeramg_strong_threshold 0.9 -ksp_view -log_view > >>> -log_view_memory -info :pc > >>> > >>> I am also attaching a standalone program to reproduce these options and > >>> the link to matrix, rhs and near null spaces (serial.tar 2.xz > >>> < > https://urldefense.us/v3/__https://ansys-my.sharepoint.com/:u:/p/ashish_patel/EbUM5Ahp-epNi4xDxR9mnN0B1dceuVzGhVXQQYJzI5Py2g__;!!G_uCfscf7eWS!ar7t_MsQ-W6SXcDyEWpSDZP_YngFSqVsz2D-8chGJHSz7IZzkLBvN4UpJ1GXrRBGyhEHqmDUQGBfqTKf5x_BPXo$ > > > >>> ) if you would like to try as well. Please let me know if you have > >>> trouble accessing the link. > >>> > >>> Ashish > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Wednesday, April 17, 2024 7:52 PM > >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel ; Scott McClennan < > >>> scott.mcclennan at ansys.com> > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> > >>> > >>> On Wed, Apr 17, 2024 at 7:20?AM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Hey Mark. Long time no see! How are thing going over there? > >>> > >>> We are using PETSc main and have found a case where memory consumption > >>> explodes in parallel. > >>> Also, we see a non-negligible difference between > >>> PetscMemoryGetMaximumUsage() and PetscMallocGetMaximumUsage(). > >>> Running in serial through /usr/bin/time, the max. resident set size > >>> matches the PetscMallocGetMaximumUsage() result. > >>> I would have expected it to match PetscMemoryGetMaximumUsage() instead. > >>> > >>> > >>> Yea, my interpretation of these methods is also that "Memory" should be > >>> >= "Malloc". But you are seeing the opposite. > >>> > >>> I don't have any idea what is going on with your big memory penalty > going > >>> from 1 to 2 cores on this test, but the first thing to do is try other > >>> solvers and see how that behaves. Hypre in particular would be a good > thing > >>> to try because it is a similar algorithm. > >>> > >>> Mark > >>> > >>> > >>> > >>> The matrix size is around 1 million. We can share it with you if you > >>> want, along with the RHS and the 6 near nullspace vectors and a > modified > >>> ex1.c which will read these files and show the following behavior. > >>> > >>> Observations using latest main for elastic matrix with a block size of > 3 > >>> (after removing bonded glue-like DOFs with direct elimination) and near > >>> null space provided > >>> > >>> - Big memory penalty going from serial to parallel (2 core) > >>> - Big difference between PetscMemoryGetMaximumUsage and > >>> PetscMallocGetMaximumUsage, why? > >>> - The memory penalty decreases with > -pc_gamg_aggressive_square_graph false > >>> (option 2) > >>> - The difference between PetscMemoryGetMaximumUsage and > >>> PetscMallocGetMaximumUsage reduces when -pc_gamg_threshold is > >>> increased from 0 to 0.01 (option 3), the solve time increase a lot > though. > >>> > >>> > >>> > >>> > >>> > >>> PetscMemoryGetMaximumUsage > >>> PetscMallocGetMaximumUsage > >>> Time > >>> Serial + Option 1 > >>> 4.8 GB > >>> 7.4 GB > >>> 112 sec > >>> 2 core + Option1 > >>> 15.2 GB > >>> 45.5 GB > >>> 150 sec > >>> Serial + Option 2 > >>> 3.1 GB > >>> 3.8 GB > >>> 167 sec > >>> 2 core + Option2 > >>> 13.1 GB > >>> 17.4 GB > >>> 89 sec > >>> Serial + Option 3 > >>> 4.7GB > >>> 5.2GB > >>> 693 sec > >>> 2 core + Option 3 > >>> 23.2 GB > >>> 26.4 GB > >>> 411 sec > >>> > >>> Option 1 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg > >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory > >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold 0.0 -info :pc > >>> > >>> Option 2 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg > >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory > >>> -pc_gamg_aggressive_square_graph *false* -pc_gamg_threshold 0.0 -info > :pc > >>> > >>> Option 3 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg > >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory > >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold *0.01* -info > :pc > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Tuesday, November 14, 2023 11:28 AM > >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> Sounds good, > >>> > >>> I think the not-square graph "aggressive" coarsening is only issue > that I > >>> see and you can fix this by using: > >>> > >>> -mat_coarsen_type mis > >>> > >>> Aside, '-pc_gamg_aggressive_square_graph' should do it also, and you > can > >>> use both and they will be ignored in earlier versions. > >>> > >>> If you see a difference then the first thing to do is run with '-info > >>> :pc' and send that to me (you can grep on 'GAMG' and send that if you > like > >>> to reduce the data). > >>> > >>> Thanks, > >>> Mark > >>> > >>> > >>> On Tue, Nov 14, 2023 at 8:49?AM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Hi Mark. > >>> Thanks for reaching out. For now, we are going to stick to 3.19 for our > >>> production code because the changes in 3.20 impact in our tests in > >>> different ways (some of them perform better, some perform worse). > >>> I now switched to another task about investigating structural elements > in > >>> DMplex. > >>> I'll go back to analyzing the new changes in GAMG in a couple of weeks > so > >>> we can then see if we upgrade to 3.20 or we wait until 3.21. > >>> > >>> Thanks for your work and your kindness. > >>> -- > >>> jeremy > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Tuesday, November 14, 2023 9:35 AM > >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> Hi Jeremy, > >>> > >>> Just following up. > >>> I appreciate your digging into performance regressions in GAMG. > >>> AMG is really a pain sometimes and we want GAMG to be solid, at least > for > >>> mainstream options, and your efforts are appreciated. > >>> So feel free to start this discussion up. > >>> > >>> Thanks, > >>> Mark > >>> > >>> On Wed, Oct 25, 2023 at 9:52?PM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Dear Mark > >>> > >>> Thanks for the follow up and sorry for the delay. > >>> I'm taking some days off. I'll be back to full throttle next week so > can > >>> continue the discussion about these changes in GAMG. > >>> > >>> Regards, > >>> Jeremy > >>> > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Wednesday, October 18, 2023 9:15 AM > >>> *To:* Jeremy Theler (External) ; PETSc > >>> users list > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> Hi Jeremy, > >>> > >>> I hope you don't mind putting this on the list (w/o data), but this is > >>> documentation and you are the second user that found regressions. > >>> Sorry for the churn. > >>> > >>> There is a lot here so we can iterate, but here is a pass at your > >>> questions. > >>> > >>> *** Using MIS-2 instead of square graph was motivated by setup > >>> cost/performance but on GPUs with some recent fixes in Kokkos (in a > branch) > >>> square graph seems OK. > >>> My experience was that square graph is better in terms of quality and > we > >>> have a power user, like you all, that found this also. > >>> So I switched the default back to square graph. > >>> > >>> Interesting that you found that MIS-2 (new method) could be faster, but > >>> it might be because the two methods coarsen at different rates and > that can > >>> make a big difference. > >>> (the way to test would be to adjust parameters to get similar coarsen > >>> rates, but I digress) > >>> It's hard to understand the differences between these two methods in > >>> terms of aggregate quality so we need to just experiment and have > options. > >>> > >>> *** As far as your thermal problem. There was a complaint that the > eigen > >>> estimates for chebyshev smoother were not recomputed for nonlinear > problems > >>> and I added an option to do that and turned it on by default: > >>> Use '-pc_gamg_recompute_esteig false' to get back to the original. > >>> (I should have turned it off by default) > >>> > >>> Now, if your problem is symmetric and you use CG to compute the eigen > >>> estimates there should be no difference. > >>> If you use CG to compute the eigen estimates in GAMG (and have GAMG > give > >>> them to cheby, the default) that when you recompute the eigen > estimates the > >>> cheby eigen estimator is used and that will use gmres by default > unless you > >>> set the SPD property in your matrix. > >>> So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set > >>> '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and > -options_left) > >>> CG is a much better estimator for SPD. > >>> > >>> And I found that the cheby eigen estimator uses an LAPACK *eigen* > method > >>> to compute the eigen bounds and GAMG uses a *singular value* method. > >>> The two give very different results on the lid driven cavity test > (ex19). > >>> eigen is lower, which is safer but not optimal if it is too low. > >>> I have a branch to have cheby use the singular value method, but I > don't > >>> plan on merging it (enough churn and I don't understand these > differences). > >>> > >>> *** '-pc_gamg_low_memory_threshold_filter false' recovers the old > >>> filtering method. > >>> This is the default now because there is a bug in the (new) low memory > >>> filter. > >>> This bug is very rare and catastrophic. > >>> We are working on it and will turn it on by default when it's fixed. > >>> This does not affect the semantics of the solver, just work and memory > >>> complexity. > >>> > >>> *** As far as tet4 vs tet10, I would guess that tet4 wants more > >>> aggressive coarsening. > >>> The default is to do aggressive on one (1) level. > >>> You might want more levels for tet4. > >>> And the new MIS-k coarsening can use any k (default is 2) wth > >>> '-mat_coarsen_misk_distance k' (eg, k=3) > >>> I have not added hooks to have a more complex schedule to specify the > >>> method on each level. > >>> > >>> Thanks, > >>> Mark > >>> > >>> On Tue, Oct 17, 2023 at 9:33?PM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Hey Mark > >>> > >>> Regarding the changes in the coarsening algorithm in 3.20 with respect > to > >>> 3.19 in general we see that for some problems the MIS strategy gives > and > >>> overall performance which is slightly better and for some others it is > >>> slightly worse than the "baseline" from 3.19. > >>> We also saw that current main has switched back to the old square > >>> coarsening algorithm by default, which again, in some cases is better > and > >>> in others is worse than 3.19 without any extra command-line option. > >>> > >>> Now what seems weird to us is that we have a test case which is a heat > >>> conduction problem with radiation boundary conditions (so it is non > linear) > >>> using tet10 and we see > >>> > >>> 1. that in parallel v3.20 is way worse than v3.19, although the > >>> memory usage is similar > >>> 2. that petsc main (with no extra flags, just the defaults) recover > >>> the 3.19 performance but memory usage is significantly larger > >>> > >>> > >>> I tried using the -pc_gamg_low_memory_threshold_filter flag and the > >>> results were the same. > >>> > >>> Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI > >>> ranks. > >>> Is there any explanation about these two points we are seeing? > >>> Another weird finding is that if we use tet4 instead of tet10, v3.20 is > >>> only 10% slower than the other two and main does not need more memory > than > >>> the other two. > >>> > >>> BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and > >>> main should you be interested. > >>> > >>> Let me know if it is better to move this discussion into the PETSc > >>> mailing list. > >>> > >>> Regards, > >>> jeremy theler > >>> > >>> > >>> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ed35Hheva03XvvROCiYSMw0awizDuqiHG-IvZWEe-6j6XOY7z0eYVj_VFWQsUtaWNm-JLkMBQQT5wFN0FfNW$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ashish.patel at ansys.com Fri Apr 19 16:05:55 2024 From: ashish.patel at ansys.com (Ashish Patel) Date: Fri, 19 Apr 2024 21:05:55 +0000 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: References: <878r1aweuv.fsf@jedbrown.org> Message-ID: Hi Matt, That seems to be a PCSetup specific outcome. If I use option2 where "-pc_gamg_aggressive_square_graph false" then PetscMatStashSpaceGet for rank0 falls down to 40 MB, with hypre (option4) I don't get that function at all. Have attached logs for both. Thanks, Ashish ________________________________ From: Matthew Knepley Sent: Friday, April 19, 2024 1:04 PM To: Ashish Patel Cc: Jed Brown ; Mark Adams ; PETSc users list ; Scott McClennan Subject: Re: [petsc-users] About recent changes in GAMG [External Sender] On Fri, Apr 19, 2024 at 3:52?PM Ashish Patel > wrote: Hi Jed, VmRss is on a higher side and seems to match what PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. Mark, running without the near nullspace also gives similar results. I have attached the malloc_view and gamg info ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi Jed, VmRss is on a higher side and seems to match what PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. Mark, running without the near nullspace also gives similar results. I have attached the malloc_view and gamg info for serial and 2 core runs. Some of the standout functions on rank 0 for parallel run seems to be 5.3 GB MatSeqAIJSetPreallocation_SeqAIJ 7.7 GB MatStashSortCompress_Private 10.1 GB PetscMatStashSpaceGet This is strange. We would expect the MatStash to be much smaller than the allocation, but it is larger. That suggests that you are sending a large number of off-process values. Is this by design? Thanks, Matt 7.7 GB PetscSegBufferAlloc_Private malloc_view also says the following [0] Maximum memory PetscMalloc()ed 32387548912 maximum size of entire process 8270635008 which fits the PetscMallocGetMaximumUsage > PetscMemoryGetMaximumUsage output. Let me know if you need some other info. Thanks, Ashish ________________________________ From: Jed Brown > Sent: Thursday, April 18, 2024 2:16 PM To: Mark Adams >; Ashish Patel >; PETSc users list > Cc: Scott McClennan > Subject: Re: [petsc-users] About recent changes in GAMG [External Sender] Mark Adams > writes: >>> Yea, my interpretation of these methods is also that " > PetscMemoryGetMaximumUsage" should be >= "PetscMallocGetMaximumUsage". >>> But you are seeing the opposite. > > > We are using PETSc main and have found a case where memory consumption > explodes in parallel. > Also, we see a non-negligible difference between PetscMemoryGetMaximumUsage() > and PetscMallocGetMaximumUsage(). > Running in serial through /usr/bin/time, the max. resident set size matches > the PetscMallocGetMaximumUsage() result. > I would have expected it to match PetscMemoryGetMaximumUsage() instead. PetscMemoryGetMaximumUsage uses procfs (if PETSC_USE_PROCFS_FOR_SIZE, which should be typical on Linux anyway) in PetscHeaderDestroy to update a static variable. If you haven't destroyed an object yet, its value will be nonsense. If your program is using huge pages, it might also be inaccurate (I don't know). You can look at /proc//statm to see what PETSc is reading (second field, which is number of pages in RSS). You can also look at the VmRss field in /proc//status, which reads in kB. See also the HugetlbPages field in /proc//status. https://urldefense.us/v3/__https://www.kernel.org/doc/Documentation/filesystems/proc.txt__;!!G_uCfscf7eWS!Z1yIeVlAkFj-DI89Kxnrfqzom6gFYVgoPlA8fbLten2hJEslP0OGywavGV1apPUK-Nh3UwOfAT8GNp8AYXq_d7DWaQo$ If your app is swapping, these will be inaccurate because swapped memory is not resident. We don't use the first field (VmSize) because there are reasons why programs sometimes map much more memory than they'll actually use, making such numbers irrelevant for most purposes. > > > PetscMemoryGetMaximumUsage > PetscMallocGetMaximumUsage > Time > Serial + Option 1 > 4.8 GB > 7.4 GB > 112 sec > 2 core + Option1 > 15.2 GB > 45.5 GB > 150 sec > Serial + Option 2 > 3.1 GB > 3.8 GB > 167 sec > 2 core + Option2 > 13.1 GB > 17.4 GB > 89 sec > Serial + Option 3 > 4.7GB > 5.2GB > 693 sec > 2 core + Option 3 > 23.2 GB > 26.4 GB > 411 sec > > > On Thu, Apr 18, 2024 at 4:13?PM Mark Adams > wrote: > >> The next thing you might try is not using the null space argument. >> Hypre does not use it, but GAMG does. >> You could also run with -malloc_view to see some info on mallocs. It is >> probably in the Mat objects. >> You can also run with "-info" and grep on GAMG in the output and send that. >> >> Mark >> >> On Thu, Apr 18, 2024 at 12:03?PM Ashish Patel > >> wrote: >> >>> Hi Mark, >>> >>> Thanks for your response and suggestion. With hypre both memory and time >>> looks good, here is the data for that >>> >>> PetscMemoryGetMaximumUsage >>> PetscMallocGetMaximumUsage >>> Time >>> Serial + Option 4 >>> 5.55 GB >>> 5.17 GB >>> 15.7 sec >>> 2 core + Option 4 >>> 5.85 GB >>> 4.69 GB >>> 21.9 sec >>> >>> Option 4 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type hypre >>> -pc_hypre_boomeramg_strong_threshold 0.9 -ksp_view -log_view >>> -log_view_memory -info :pc >>> >>> I am also attaching a standalone program to reproduce these options and >>> the link to matrix, rhs and near null spaces (serial.tar 2.xz >>> >>> ) if you would like to try as well. Please let me know if you have >>> trouble accessing the link. >>> >>> Ashish >>> ------------------------------ >>> *From:* Mark Adams > >>> *Sent:* Wednesday, April 17, 2024 7:52 PM >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel >; Scott McClennan < >>> scott.mcclennan at ansys.com> >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> >>> >>> On Wed, Apr 17, 2024 at 7:20?AM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Hey Mark. Long time no see! How are thing going over there? >>> >>> We are using PETSc main and have found a case where memory consumption >>> explodes in parallel. >>> Also, we see a non-negligible difference between >>> PetscMemoryGetMaximumUsage() and PetscMallocGetMaximumUsage(). >>> Running in serial through /usr/bin/time, the max. resident set size >>> matches the PetscMallocGetMaximumUsage() result. >>> I would have expected it to match PetscMemoryGetMaximumUsage() instead. >>> >>> >>> Yea, my interpretation of these methods is also that "Memory" should be >>> >= "Malloc". But you are seeing the opposite. >>> >>> I don't have any idea what is going on with your big memory penalty going >>> from 1 to 2 cores on this test, but the first thing to do is try other >>> solvers and see how that behaves. Hypre in particular would be a good thing >>> to try because it is a similar algorithm. >>> >>> Mark >>> >>> >>> >>> The matrix size is around 1 million. We can share it with you if you >>> want, along with the RHS and the 6 near nullspace vectors and a modified >>> ex1.c which will read these files and show the following behavior. >>> >>> Observations using latest main for elastic matrix with a block size of 3 >>> (after removing bonded glue-like DOFs with direct elimination) and near >>> null space provided >>> >>> - Big memory penalty going from serial to parallel (2 core) >>> - Big difference between PetscMemoryGetMaximumUsage and >>> PetscMallocGetMaximumUsage, why? >>> - The memory penalty decreases with -pc_gamg_aggressive_square_graph false >>> (option 2) >>> - The difference between PetscMemoryGetMaximumUsage and >>> PetscMallocGetMaximumUsage reduces when -pc_gamg_threshold is >>> increased from 0 to 0.01 (option 3), the solve time increase a lot though. >>> >>> >>> >>> >>> >>> PetscMemoryGetMaximumUsage >>> PetscMallocGetMaximumUsage >>> Time >>> Serial + Option 1 >>> 4.8 GB >>> 7.4 GB >>> 112 sec >>> 2 core + Option1 >>> 15.2 GB >>> 45.5 GB >>> 150 sec >>> Serial + Option 2 >>> 3.1 GB >>> 3.8 GB >>> 167 sec >>> 2 core + Option2 >>> 13.1 GB >>> 17.4 GB >>> 89 sec >>> Serial + Option 3 >>> 4.7GB >>> 5.2GB >>> 693 sec >>> 2 core + Option 3 >>> 23.2 GB >>> 26.4 GB >>> 411 sec >>> >>> Option 1 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold 0.0 -info :pc >>> >>> Option 2 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >>> -pc_gamg_aggressive_square_graph *false* -pc_gamg_threshold 0.0 -info :pc >>> >>> Option 3 >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold *0.01* -info :pc >>> ------------------------------ >>> *From:* Mark Adams > >>> *Sent:* Tuesday, November 14, 2023 11:28 AM >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> Sounds good, >>> >>> I think the not-square graph "aggressive" coarsening is only issue that I >>> see and you can fix this by using: >>> >>> -mat_coarsen_type mis >>> >>> Aside, '-pc_gamg_aggressive_square_graph' should do it also, and you can >>> use both and they will be ignored in earlier versions. >>> >>> If you see a difference then the first thing to do is run with '-info >>> :pc' and send that to me (you can grep on 'GAMG' and send that if you like >>> to reduce the data). >>> >>> Thanks, >>> Mark >>> >>> >>> On Tue, Nov 14, 2023 at 8:49?AM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Hi Mark. >>> Thanks for reaching out. For now, we are going to stick to 3.19 for our >>> production code because the changes in 3.20 impact in our tests in >>> different ways (some of them perform better, some perform worse). >>> I now switched to another task about investigating structural elements in >>> DMplex. >>> I'll go back to analyzing the new changes in GAMG in a couple of weeks so >>> we can then see if we upgrade to 3.20 or we wait until 3.21. >>> >>> Thanks for your work and your kindness. >>> -- >>> jeremy >>> ------------------------------ >>> *From:* Mark Adams > >>> *Sent:* Tuesday, November 14, 2023 9:35 AM >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> Hi Jeremy, >>> >>> Just following up. >>> I appreciate your digging into performance regressions in GAMG. >>> AMG is really a pain sometimes and we want GAMG to be solid, at least for >>> mainstream options, and your efforts are appreciated. >>> So feel free to start this discussion up. >>> >>> Thanks, >>> Mark >>> >>> On Wed, Oct 25, 2023 at 9:52?PM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Dear Mark >>> >>> Thanks for the follow up and sorry for the delay. >>> I'm taking some days off. I'll be back to full throttle next week so can >>> continue the discussion about these changes in GAMG. >>> >>> Regards, >>> Jeremy >>> >>> ------------------------------ >>> *From:* Mark Adams > >>> *Sent:* Wednesday, October 18, 2023 9:15 AM >>> *To:* Jeremy Theler (External) >; PETSc >>> users list > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG >>> >>> >>> *[External Sender]* >>> Hi Jeremy, >>> >>> I hope you don't mind putting this on the list (w/o data), but this is >>> documentation and you are the second user that found regressions. >>> Sorry for the churn. >>> >>> There is a lot here so we can iterate, but here is a pass at your >>> questions. >>> >>> *** Using MIS-2 instead of square graph was motivated by setup >>> cost/performance but on GPUs with some recent fixes in Kokkos (in a branch) >>> square graph seems OK. >>> My experience was that square graph is better in terms of quality and we >>> have a power user, like you all, that found this also. >>> So I switched the default back to square graph. >>> >>> Interesting that you found that MIS-2 (new method) could be faster, but >>> it might be because the two methods coarsen at different rates and that can >>> make a big difference. >>> (the way to test would be to adjust parameters to get similar coarsen >>> rates, but I digress) >>> It's hard to understand the differences between these two methods in >>> terms of aggregate quality so we need to just experiment and have options. >>> >>> *** As far as your thermal problem. There was a complaint that the eigen >>> estimates for chebyshev smoother were not recomputed for nonlinear problems >>> and I added an option to do that and turned it on by default: >>> Use '-pc_gamg_recompute_esteig false' to get back to the original. >>> (I should have turned it off by default) >>> >>> Now, if your problem is symmetric and you use CG to compute the eigen >>> estimates there should be no difference. >>> If you use CG to compute the eigen estimates in GAMG (and have GAMG give >>> them to cheby, the default) that when you recompute the eigen estimates the >>> cheby eigen estimator is used and that will use gmres by default unless you >>> set the SPD property in your matrix. >>> So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set >>> '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and -options_left) >>> CG is a much better estimator for SPD. >>> >>> And I found that the cheby eigen estimator uses an LAPACK *eigen* method >>> to compute the eigen bounds and GAMG uses a *singular value* method. >>> The two give very different results on the lid driven cavity test (ex19). >>> eigen is lower, which is safer but not optimal if it is too low. >>> I have a branch to have cheby use the singular value method, but I don't >>> plan on merging it (enough churn and I don't understand these differences). >>> >>> *** '-pc_gamg_low_memory_threshold_filter false' recovers the old >>> filtering method. >>> This is the default now because there is a bug in the (new) low memory >>> filter. >>> This bug is very rare and catastrophic. >>> We are working on it and will turn it on by default when it's fixed. >>> This does not affect the semantics of the solver, just work and memory >>> complexity. >>> >>> *** As far as tet4 vs tet10, I would guess that tet4 wants more >>> aggressive coarsening. >>> The default is to do aggressive on one (1) level. >>> You might want more levels for tet4. >>> And the new MIS-k coarsening can use any k (default is 2) wth >>> '-mat_coarsen_misk_distance k' (eg, k=3) >>> I have not added hooks to have a more complex schedule to specify the >>> method on each level. >>> >>> Thanks, >>> Mark >>> >>> On Tue, Oct 17, 2023 at 9:33?PM Jeremy Theler (External) < >>> jeremy.theler-ext at ansys.com> wrote: >>> >>> Hey Mark >>> >>> Regarding the changes in the coarsening algorithm in 3.20 with respect to >>> 3.19 in general we see that for some problems the MIS strategy gives and >>> overall performance which is slightly better and for some others it is >>> slightly worse than the "baseline" from 3.19. >>> We also saw that current main has switched back to the old square >>> coarsening algorithm by default, which again, in some cases is better and >>> in others is worse than 3.19 without any extra command-line option. >>> >>> Now what seems weird to us is that we have a test case which is a heat >>> conduction problem with radiation boundary conditions (so it is non linear) >>> using tet10 and we see >>> >>> 1. that in parallel v3.20 is way worse than v3.19, although the >>> memory usage is similar >>> 2. that petsc main (with no extra flags, just the defaults) recover >>> the 3.19 performance but memory usage is significantly larger >>> >>> >>> I tried using the -pc_gamg_low_memory_threshold_filter flag and the >>> results were the same. >>> >>> Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI >>> ranks. >>> Is there any explanation about these two points we are seeing? >>> Another weird finding is that if we use tet4 instead of tet10, v3.20 is >>> only 10% slower than the other two and main does not need more memory than >>> the other two. >>> >>> BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and >>> main should you be interested. >>> >>> Let me know if it is better to move this discussion into the PETSc >>> mailing list. >>> >>> Regards, >>> jeremy theler >>> >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Z1yIeVlAkFj-DI89Kxnrfqzom6gFYVgoPlA8fbLten2hJEslP0OGywavGV1apPUK-Nh3UwOfAT8GNp8AYXq_vPSqgr0$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: option2_2core.log Type: text/x-log Size: 41194 bytes Desc: option2_2core.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: option4_2core.log Type: text/x-log Size: 23418 bytes Desc: option4_2core.log URL: From mfadams at lbl.gov Fri Apr 19 17:59:39 2024 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 19 Apr 2024 18:59:39 -0400 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: References: <878r1aweuv.fsf@jedbrown.org> Message-ID: On Fri, Apr 19, 2024 at 5:06?PM Ashish Patel wrote: > Hi Matt, > > That seems to be a PCSetup specific outcome. If I use option2 where > "-pc_gamg_aggressive_square_graph false" then PetscMatStashSpaceGet for > rank0 falls down to 40 MB, with hypre (option4) I don't get that function > at all. Have attached logs for both. > > The "square" graph uses Mat-Mat-Mult on the graph of the matrix, and hypre does not. The coarse grid space "smoother" (SA) also uses a Mat-Mat-Mult and hypre's AMG algorithm does not. I suspect hypre has a custom coarsener that coarsens "aggressively" and is memory efficient. Mat-Mat-Mult is brute force and slow on CPU, but not bad on GPUs as I recall, which was the reason that I put in the new aggressive coarsener last year that uses a different algorithm. And with square graph: [0] 2 7 694 605 360 MatStashSortCompress_Private() That is enormous. I imagine MatStashSortCompress_Private could use some love. [0] 2 7 694 605 360 MatStashSortCompress_Private() [1] 2 100 354 160 MatStashSortCompress_Private() Ditto on love The "GAMG" metadata looks fine, but the grid complexities are a bit high, but that is a detail. These look like linear tetrahedral elements, which coarsen a bit slow even with aggressive coarsening but it's not that bad. Linear tets are pretty bad for elasticity and P2 would coarsen faster naturally. There is some load imbalance in the memory of matrices: [0] 40 1 218 367 920 MatSeqAIJSetPreallocation_SeqAIJ() [1] 40 677 649 104 MatSeqAIJSetPreallocation_SeqAIJ() The coarse grid is on processor 0, but it is not that large. Like 5Mb. Not sure what that is about. It is not clear to me why the Mat-Mat in square graph is such a problem but the Mat-Mat-Mat in the RAP and the Mat-Mat in SA are not catastrophic, but maybe I am not interpreting this correctly. Regardless it looks like the Mat-Mat methods could use some attention. Thanks, Mark Thanks, > Ashish > ------------------------------ > *From:* Matthew Knepley > *Sent:* Friday, April 19, 2024 1:04 PM > *To:* Ashish Patel > *Cc:* Jed Brown ; Mark Adams ; PETSc > users list ; Scott McClennan < > scott.mcclennan at ansys.com> > *Subject:* Re: [petsc-users] About recent changes in GAMG > > > *[External Sender]* > On Fri, Apr 19, 2024 at 3:52?PM Ashish Patel > wrote: > > Hi Jed, VmRss is on a higher side and seems to match what > PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. Mark, > running without the near nullspace also gives similar results. I have > attached the malloc_view and gamg info > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > Hi Jed, > VmRss is on a higher side and seems to match what > PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. > > Mark, running without the near nullspace also gives similar results. I > have attached the malloc_view and gamg info for serial and 2 core runs. > Some of the standout functions on rank 0 for parallel run seems to be > 5.3 GB MatSeqAIJSetPreallocation_SeqAIJ > 7.7 GB MatStashSortCompress_Private > 10.1 GB PetscMatStashSpaceGet > > > This is strange. We would expect the MatStash to be much smaller than the > allocation, but it is larger. > That suggests that you are sending a large number of off-process values. > Is this by design? > > Thanks, > > Matt > > > 7.7 GB PetscSegBufferAlloc_Private > > malloc_view also says the following > [0] Maximum memory PetscMalloc()ed 32387548912 maximum size of entire > process 8270635008 > which fits the PetscMallocGetMaximumUsage > PetscMemoryGetMaximumUsage > output. > > Let me know if you need some other info. > > Thanks, > Ashish > > ------------------------------ > *From:* Jed Brown > *Sent:* Thursday, April 18, 2024 2:16 PM > *To:* Mark Adams ; Ashish Patel ; > PETSc users list > *Cc:* Scott McClennan > *Subject:* Re: [petsc-users] About recent changes in GAMG > > [External Sender] > > Mark Adams writes: > > >>> Yea, my interpretation of these methods is also that " > > PetscMemoryGetMaximumUsage" should be >= "PetscMallocGetMaximumUsage". > >>> But you are seeing the opposite. > > > > > > We are using PETSc main and have found a case where memory consumption > > explodes in parallel. > > Also, we see a non-negligible difference between > PetscMemoryGetMaximumUsage() > > and PetscMallocGetMaximumUsage(). > > Running in serial through /usr/bin/time, the max. resident set size > matches > > the PetscMallocGetMaximumUsage() result. > > I would have expected it to match PetscMemoryGetMaximumUsage() instead. > > PetscMemoryGetMaximumUsage uses procfs (if PETSC_USE_PROCFS_FOR_SIZE, > which should be typical on Linux anyway) in PetscHeaderDestroy to update a > static variable. If you haven't destroyed an object yet, its value will be > nonsense. > > If your program is using huge pages, it might also be inaccurate (I don't > know). You can look at /proc//statm to see what PETSc is reading > (second field, which is number of pages in RSS). You can also look at the > VmRss field in /proc//status, which reads in kB. See also the > HugetlbPages field in /proc//status. > > https://urldefense.us/v3/__https://www.kernel.org/doc/Documentation/filesystems/proc.txt__;!!G_uCfscf7eWS!ZWpgqTYPfSh8baj8FyLvMAxZoaznrIYb4OyREM2ueW18-7n5FaVAgrRZmeRu602ugDbFhVs-wMao9C3WcUOepg4$ > > > If your app is swapping, these will be inaccurate because swapped memory > is not resident. We don't use the first field (VmSize) because there are > reasons why programs sometimes map much more memory than they'll actually > use, making such numbers irrelevant for most purposes. > > > > > > > PetscMemoryGetMaximumUsage > > PetscMallocGetMaximumUsage > > Time > > Serial + Option 1 > > 4.8 GB > > 7.4 GB > > 112 sec > > 2 core + Option1 > > 15.2 GB > > 45.5 GB > > 150 sec > > Serial + Option 2 > > 3.1 GB > > 3.8 GB > > 167 sec > > 2 core + Option2 > > 13.1 GB > > 17.4 GB > > 89 sec > > Serial + Option 3 > > 4.7GB > > 5.2GB > > 693 sec > > 2 core + Option 3 > > 23.2 GB > > 26.4 GB > > 411 sec > > > > > > On Thu, Apr 18, 2024 at 4:13?PM Mark Adams wrote: > > > >> The next thing you might try is not using the null space argument. > >> Hypre does not use it, but GAMG does. > >> You could also run with -malloc_view to see some info on mallocs. It is > >> probably in the Mat objects. > >> You can also run with "-info" and grep on GAMG in the output and send > that. > >> > >> Mark > >> > >> On Thu, Apr 18, 2024 at 12:03?PM Ashish Patel > >> wrote: > >> > >>> Hi Mark, > >>> > >>> Thanks for your response and suggestion. With hypre both memory and > time > >>> looks good, here is the data for that > >>> > >>> PetscMemoryGetMaximumUsage > >>> PetscMallocGetMaximumUsage > >>> Time > >>> Serial + Option 4 > >>> 5.55 GB > >>> 5.17 GB > >>> 15.7 sec > >>> 2 core + Option 4 > >>> 5.85 GB > >>> 4.69 GB > >>> 21.9 sec > >>> > >>> Option 4 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type hypre > >>> -pc_hypre_boomeramg_strong_threshold 0.9 -ksp_view -log_view > >>> -log_view_memory -info :pc > >>> > >>> I am also attaching a standalone program to reproduce these options and > >>> the link to matrix, rhs and near null spaces (serial.tar 2.xz > >>> < > https://urldefense.us/v3/__https://ansys-my.sharepoint.com/:u:/p/ashish_patel/EbUM5Ahp-epNi4xDxR9mnN0B1dceuVzGhVXQQYJzI5Py2g__;!!G_uCfscf7eWS!ar7t_MsQ-W6SXcDyEWpSDZP_YngFSqVsz2D-8chGJHSz7IZzkLBvN4UpJ1GXrRBGyhEHqmDUQGBfqTKf5x_BPXo$ > > > >>> ) if you would like to try as well. Please let me know if you have > >>> trouble accessing the link. > >>> > >>> Ashish > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Wednesday, April 17, 2024 7:52 PM > >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel ; Scott McClennan < > >>> scott.mcclennan at ansys.com> > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> > >>> > >>> On Wed, Apr 17, 2024 at 7:20?AM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Hey Mark. Long time no see! How are thing going over there? > >>> > >>> We are using PETSc main and have found a case where memory consumption > >>> explodes in parallel. > >>> Also, we see a non-negligible difference between > >>> PetscMemoryGetMaximumUsage() and PetscMallocGetMaximumUsage(). > >>> Running in serial through /usr/bin/time, the max. resident set size > >>> matches the PetscMallocGetMaximumUsage() result. > >>> I would have expected it to match PetscMemoryGetMaximumUsage() instead. > >>> > >>> > >>> Yea, my interpretation of these methods is also that "Memory" should be > >>> >= "Malloc". But you are seeing the opposite. > >>> > >>> I don't have any idea what is going on with your big memory penalty > going > >>> from 1 to 2 cores on this test, but the first thing to do is try other > >>> solvers and see how that behaves. Hypre in particular would be a good > thing > >>> to try because it is a similar algorithm. > >>> > >>> Mark > >>> > >>> > >>> > >>> The matrix size is around 1 million. We can share it with you if you > >>> want, along with the RHS and the 6 near nullspace vectors and a > modified > >>> ex1.c which will read these files and show the following behavior. > >>> > >>> Observations using latest main for elastic matrix with a block size of > 3 > >>> (after removing bonded glue-like DOFs with direct elimination) and near > >>> null space provided > >>> > >>> - Big memory penalty going from serial to parallel (2 core) > >>> - Big difference between PetscMemoryGetMaximumUsage and > >>> PetscMallocGetMaximumUsage, why? > >>> - The memory penalty decreases with > -pc_gamg_aggressive_square_graph false > >>> (option 2) > >>> - The difference between PetscMemoryGetMaximumUsage and > >>> PetscMallocGetMaximumUsage reduces when -pc_gamg_threshold is > >>> increased from 0 to 0.01 (option 3), the solve time increase a lot > though. > >>> > >>> > >>> > >>> > >>> > >>> PetscMemoryGetMaximumUsage > >>> PetscMallocGetMaximumUsage > >>> Time > >>> Serial + Option 1 > >>> 4.8 GB > >>> 7.4 GB > >>> 112 sec > >>> 2 core + Option1 > >>> 15.2 GB > >>> 45.5 GB > >>> 150 sec > >>> Serial + Option 2 > >>> 3.1 GB > >>> 3.8 GB > >>> 167 sec > >>> 2 core + Option2 > >>> 13.1 GB > >>> 17.4 GB > >>> 89 sec > >>> Serial + Option 3 > >>> 4.7GB > >>> 5.2GB > >>> 693 sec > >>> 2 core + Option 3 > >>> 23.2 GB > >>> 26.4 GB > >>> 411 sec > >>> > >>> Option 1 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg > >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory > >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold 0.0 -info :pc > >>> > >>> Option 2 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg > >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory > >>> -pc_gamg_aggressive_square_graph *false* -pc_gamg_threshold 0.0 -info > :pc > >>> > >>> Option 3 > >>> mpirun -n _ ./ex1 -A_name matrix.dat -b_name vector.dat -n_name > >>> _null_space.dat -num_near_nullspace 6 -ksp_type cg -pc_type gamg > >>> -pc_gamg_coarse_eq_limit 1000 -ksp_view -log_view -log_view_memory > >>> -pc_gamg_aggressive_square_graph true -pc_gamg_threshold *0.01* -info > :pc > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Tuesday, November 14, 2023 11:28 AM > >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> Sounds good, > >>> > >>> I think the not-square graph "aggressive" coarsening is only issue > that I > >>> see and you can fix this by using: > >>> > >>> -mat_coarsen_type mis > >>> > >>> Aside, '-pc_gamg_aggressive_square_graph' should do it also, and you > can > >>> use both and they will be ignored in earlier versions. > >>> > >>> If you see a difference then the first thing to do is run with '-info > >>> :pc' and send that to me (you can grep on 'GAMG' and send that if you > like > >>> to reduce the data). > >>> > >>> Thanks, > >>> Mark > >>> > >>> > >>> On Tue, Nov 14, 2023 at 8:49?AM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Hi Mark. > >>> Thanks for reaching out. For now, we are going to stick to 3.19 for our > >>> production code because the changes in 3.20 impact in our tests in > >>> different ways (some of them perform better, some perform worse). > >>> I now switched to another task about investigating structural elements > in > >>> DMplex. > >>> I'll go back to analyzing the new changes in GAMG in a couple of weeks > so > >>> we can then see if we upgrade to 3.20 or we wait until 3.21. > >>> > >>> Thanks for your work and your kindness. > >>> -- > >>> jeremy > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Tuesday, November 14, 2023 9:35 AM > >>> *To:* Jeremy Theler (External) > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> Hi Jeremy, > >>> > >>> Just following up. > >>> I appreciate your digging into performance regressions in GAMG. > >>> AMG is really a pain sometimes and we want GAMG to be solid, at least > for > >>> mainstream options, and your efforts are appreciated. > >>> So feel free to start this discussion up. > >>> > >>> Thanks, > >>> Mark > >>> > >>> On Wed, Oct 25, 2023 at 9:52?PM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Dear Mark > >>> > >>> Thanks for the follow up and sorry for the delay. > >>> I'm taking some days off. I'll be back to full throttle next week so > can > >>> continue the discussion about these changes in GAMG. > >>> > >>> Regards, > >>> Jeremy > >>> > >>> ------------------------------ > >>> *From:* Mark Adams > >>> *Sent:* Wednesday, October 18, 2023 9:15 AM > >>> *To:* Jeremy Theler (External) ; PETSc > >>> users list > >>> *Cc:* Ashish Patel > >>> *Subject:* Re: About recent changes in GAMG > >>> > >>> > >>> *[External Sender]* > >>> Hi Jeremy, > >>> > >>> I hope you don't mind putting this on the list (w/o data), but this is > >>> documentation and you are the second user that found regressions. > >>> Sorry for the churn. > >>> > >>> There is a lot here so we can iterate, but here is a pass at your > >>> questions. > >>> > >>> *** Using MIS-2 instead of square graph was motivated by setup > >>> cost/performance but on GPUs with some recent fixes in Kokkos (in a > branch) > >>> square graph seems OK. > >>> My experience was that square graph is better in terms of quality and > we > >>> have a power user, like you all, that found this also. > >>> So I switched the default back to square graph. > >>> > >>> Interesting that you found that MIS-2 (new method) could be faster, but > >>> it might be because the two methods coarsen at different rates and > that can > >>> make a big difference. > >>> (the way to test would be to adjust parameters to get similar coarsen > >>> rates, but I digress) > >>> It's hard to understand the differences between these two methods in > >>> terms of aggregate quality so we need to just experiment and have > options. > >>> > >>> *** As far as your thermal problem. There was a complaint that the > eigen > >>> estimates for chebyshev smoother were not recomputed for nonlinear > problems > >>> and I added an option to do that and turned it on by default: > >>> Use '-pc_gamg_recompute_esteig false' to get back to the original. > >>> (I should have turned it off by default) > >>> > >>> Now, if your problem is symmetric and you use CG to compute the eigen > >>> estimates there should be no difference. > >>> If you use CG to compute the eigen estimates in GAMG (and have GAMG > give > >>> them to cheby, the default) that when you recompute the eigen > estimates the > >>> cheby eigen estimator is used and that will use gmres by default > unless you > >>> set the SPD property in your matrix. > >>> So if you set '-pc_gamg_esteig_ksp_type cg' you want to also set > >>> '-mg_levels_esteig_ksp_type cg' (verify with -ksp_view and > -options_left) > >>> CG is a much better estimator for SPD. > >>> > >>> And I found that the cheby eigen estimator uses an LAPACK *eigen* > method > >>> to compute the eigen bounds and GAMG uses a *singular value* method. > >>> The two give very different results on the lid driven cavity test > (ex19). > >>> eigen is lower, which is safer but not optimal if it is too low. > >>> I have a branch to have cheby use the singular value method, but I > don't > >>> plan on merging it (enough churn and I don't understand these > differences). > >>> > >>> *** '-pc_gamg_low_memory_threshold_filter false' recovers the old > >>> filtering method. > >>> This is the default now because there is a bug in the (new) low memory > >>> filter. > >>> This bug is very rare and catastrophic. > >>> We are working on it and will turn it on by default when it's fixed. > >>> This does not affect the semantics of the solver, just work and memory > >>> complexity. > >>> > >>> *** As far as tet4 vs tet10, I would guess that tet4 wants more > >>> aggressive coarsening. > >>> The default is to do aggressive on one (1) level. > >>> You might want more levels for tet4. > >>> And the new MIS-k coarsening can use any k (default is 2) wth > >>> '-mat_coarsen_misk_distance k' (eg, k=3) > >>> I have not added hooks to have a more complex schedule to specify the > >>> method on each level. > >>> > >>> Thanks, > >>> Mark > >>> > >>> On Tue, Oct 17, 2023 at 9:33?PM Jeremy Theler (External) < > >>> jeremy.theler-ext at ansys.com> wrote: > >>> > >>> Hey Mark > >>> > >>> Regarding the changes in the coarsening algorithm in 3.20 with respect > to > >>> 3.19 in general we see that for some problems the MIS strategy gives > and > >>> overall performance which is slightly better and for some others it is > >>> slightly worse than the "baseline" from 3.19. > >>> We also saw that current main has switched back to the old square > >>> coarsening algorithm by default, which again, in some cases is better > and > >>> in others is worse than 3.19 without any extra command-line option. > >>> > >>> Now what seems weird to us is that we have a test case which is a heat > >>> conduction problem with radiation boundary conditions (so it is non > linear) > >>> using tet10 and we see > >>> > >>> 1. that in parallel v3.20 is way worse than v3.19, although the > >>> memory usage is similar > >>> 2. that petsc main (with no extra flags, just the defaults) recover > >>> the 3.19 performance but memory usage is significantly larger > >>> > >>> > >>> I tried using the -pc_gamg_low_memory_threshold_filter flag and the > >>> results were the same. > >>> > >>> Find attached the log and snes views of 3.19, 3.20 and main using 4 MPI > >>> ranks. > >>> Is there any explanation about these two points we are seeing? > >>> Another weird finding is that if we use tet4 instead of tet10, v3.20 is > >>> only 10% slower than the other two and main does not need more memory > than > >>> the other two. > >>> > >>> BTW, I have dozens of other log view outputs comparing 3.19, 3.20 and > >>> main should you be interested. > >>> > >>> Let me know if it is better to move this discussion into the PETSc > >>> mailing list. > >>> > >>> Regards, > >>> jeremy theler > >>> > >>> > >>> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ZWpgqTYPfSh8baj8FyLvMAxZoaznrIYb4OyREM2ueW18-7n5FaVAgrRZmeRu602ugDbFhVs-wMao9C3W_MBeLBI$ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Sat Apr 20 13:35:08 2024 From: mlohry at gmail.com (Mark Lohry) Date: Sat, 20 Apr 2024 14:35:08 -0400 Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition Message-ID: I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, the Jacobian should be constant/independent of the solution. When I set the initial condition passed to SNESComputeJacobian as some constant, say f(x)=1 or 0, the petsc finite difference jacobian agrees with my hand coded-version. But when I pass it some non-constant value, e.g. f(x)=sin(x), something goes horribly wrong in the petsc jacobian. Implementing my own rudimentary finite difference approximation (similar to how I thought petsc computes it) it returns the correct jacobian to expected error. Any idea what could be going on? Analytically computed Jacobian: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x)=1: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x) = sin(x): -1.65547e+08 -3.31856e+08 -1.25427e+09 4.4844e+08 0 0 1.03206e+08 7.86375e+07 9.13788e+07 1.83178e+08 6.92336e+08 -2.4753e+08 0 0 -5.69678e+07 -4.34064e+07 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 9.13788e+07 1.83178e+08 0 0 -1.24151e+08 -7.38608e+07 -5.69678e+07 -4.34064e+07 -1.65547e+08 -3.31856e+08 0 0 2.24919e+08 1.3381e+08 1.03206e+08 7.86375e+07 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Apr 21 07:53:33 2024 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 21 Apr 2024 08:53:33 -0400 Subject: [petsc-users] Periodic boundary conditions using swarm In-Reply-To: <2E956128-D31A-4D45-86B7-F1DC8F7610E7@us.es> References: <2E956128-D31A-4D45-86B7-F1DC8F7610E7@us.es> Message-ID: On Thu, Apr 18, 2024 at 8:23?AM MIGUEL MOLINOS PEREZ wrote: > ? Dear all,? I am working on the implementation of periodic bcc using a > discretisation (PIC-style). I am working with a test case which consists on > solving the advection of a set of particles inside of a box (DMDA mesh) > with periodic bcc on > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > ? > Dear all,? > > I am working on the implementation of periodic bcc using a discretisation > (PIC-style). I am working with a test case which consists on solving the > advection of a set of particles inside of a box (DMDA mesh) with periodic > bcc on the x axis. > > My implementation updates the position of each particle with a velocity > field, afterwards I check if the particle is inside, or not, the supercell > (periodic box). If not, I correct the position using bcc conditions. Once > this step is done, I call Dmswarmmigrate. > > It works in serial, but crashes in parallel with MPI (see attached nohup > file). I have checked some of the Dmswarmmigrate examples, and they looks > similar to my implementation. However they do not use periodic bcc. > > I am missing any step in addition to Dmswarmmigrate? > It does not sound like it. We do have parallel examples of periodic migration, such as Swarm ex9. What happens if you turn off periodicity and just let particles fall out of the box? Does it still crash? Thanks, Matt > Best regards > Miguel > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YAPx7Q5VFXWaibGKwqJDGzxvP0snoJmOJZWb4-jtxTRnZNS4wl35_Rp74soLPXHoOsFuG3UoInaqqVUz6Az-$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Sun Apr 21 08:38:26 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Sun, 21 Apr 2024 13:38:26 +0000 Subject: [petsc-users] Periodic boundary conditions using swarm In-Reply-To: References: <2E956128-D31A-4D45-86B7-F1DC8F7610E7@us.es> Message-ID: <7AF4DE7D-D348-41E6-80D0-4AD3E2686717@us.es> Dear Matt, Thank you for your answer. In addition to your suggestion I solved a bug in the test (I was not updating the local size integer during the time loop). Indeed if I turn off periodicity it works. Furthermore, if I use instead DM_BOUNDARY_TWIST instead, it works too. However, if I turn on DM_BOUNDARY_PERIODIC, I got an error in the search algorithm I implemented for the domain decomposition inspired by (https://urldefense.us/v3/__https://petsc.org/main/src/dm/tutorials/swarm_ex3.c.html__;!!G_uCfscf7eWS!YAbzai9IC1I94u8JUesDCmTh_xrCY1-UQe7ADOjgXknpsx0yj16i_XDanMUZrR185We2lV8noNubEDSr4sVJ9Q$ ). The algorithm is not capable of finding some of the particles at the initial stage of the simulation (without transport). Looks like the error is on my end, however it is puzzling why it works for DM_BOUNDARY_TWIST but not for DM_BOUNDARY_PERIODIC. Thanks, Miguel On 21 Apr 2024, at 14:53, Matthew Knepley wrote: On Thu, Apr 18, 2024 at 8:23?AM MIGUEL MOLINOS PEREZ > wrote: ? Dear all,? I am working on the implementation of periodic bcc using a discretisation (PIC-style). I am working with a test case which consists on solving the advection of a set of particles inside of a box (DMDA mesh) with periodic bcc on ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd ? Dear all,? I am working on the implementation of periodic bcc using a discretisation (PIC-style). I am working with a test case which consists on solving the advection of a set of particles inside of a box (DMDA mesh) with periodic bcc on the x axis. My implementation updates the position of each particle with a velocity field, afterwards I check if the particle is inside, or not, the supercell (periodic box). If not, I correct the position using bcc conditions. Once this step is done, I call Dmswarmmigrate. It works in serial, but crashes in parallel with MPI (see attached nohup file). I have checked some of the Dmswarmmigrate examples, and they looks similar to my implementation. However they do not use periodic bcc. I am missing any step in addition to Dmswarmmigrate? It does not sound like it. We do have parallel examples of periodic migration, such as Swarm ex9. What happens if you turn off periodicity and just let particles fall out of the box? Does it still crash? Thanks, Matt Best regards Miguel -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YAbzai9IC1I94u8JUesDCmTh_xrCY1-UQe7ADOjgXknpsx0yj16i_XDanMUZrR185We2lV8noNubEDQfcSCPxg$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Apr 21 09:01:49 2024 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 21 Apr 2024 10:01:49 -0400 Subject: [petsc-users] Periodic boundary conditions using swarm In-Reply-To: <7AF4DE7D-D348-41E6-80D0-4AD3E2686717@us.es> References: <2E956128-D31A-4D45-86B7-F1DC8F7610E7@us.es> <7AF4DE7D-D348-41E6-80D0-4AD3E2686717@us.es> Message-ID: On Sun, Apr 21, 2024 at 9:38?AM MIGUEL MOLINOS PEREZ wrote: > Dear Matt, > > Thank you for your answer. In addition to your suggestion I solved a bug > in the test (I was not updating the local size integer during the time > loop). Indeed if I turn off periodicity it works. Furthermore, if I use > instead DM_BOUNDARY_TWIST instead, it works too. > > However, if I turn on DM_BOUNDARY_PERIODIC, I got an error in the search > algorithm I implemented for the domain decomposition inspired by ( > https://urldefense.us/v3/__https://petsc.org/main/src/dm/tutorials/swarm_ex3.c.html__;!!G_uCfscf7eWS!flWhzJ-Vd_S7zFFxywsQXO7UelXYvJprWVsqngT5k0SlO7wMjGMHQ_D_3lQ-2_FNdN9pw-ztQ4gVIcl8_Lle$ ). The algorithm > is not capable of finding some of the particles at the initial stage of the > simulation (without transport). > > Looks like the error is on my end, however it is puzzling why it works for > DM_BOUNDARY_TWIST but not for DM_BOUNDARY_PERIODIC. > I usually solve these things by making a simple example. We could make another test in Swarm test ex1 that uses periodicity. If you send a draft over that fails, I can help you debug it. It would make a fantastic contribution to PETSc. Thanks, Matt > Thanks, > Miguel > > On 21 Apr 2024, at 14:53, Matthew Knepley wrote: > > On Thu, Apr 18, 2024 at 8:23?AM MIGUEL MOLINOS PEREZ > wrote: > >> ? Dear all,? I am working on the implementation of periodic bcc using a >> discretisation (PIC-style). I am working with a test case which consists on >> solving the advection of a set of particles inside of a box (DMDA mesh) >> with periodic bcc on >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> ? >> Dear all,? >> >> I am working on the implementation of periodic bcc using a discretisation >> (PIC-style). I am working with a test case which consists on solving the >> advection of a set of particles inside of a box (DMDA mesh) with periodic >> bcc on the x axis. >> >> My implementation updates the position of each particle with a velocity >> field, afterwards I check if the particle is inside, or not, the supercell >> (periodic box). If not, I correct the position using bcc conditions. Once >> this step is done, I call Dmswarmmigrate. >> >> It works in serial, but crashes in parallel with MPI (see attached nohup >> file). I have checked some of the Dmswarmmigrate examples, and they >> looks similar to my implementation. However they do not use periodic bcc. >> >> I am missing any step in addition to Dmswarmmigrate? >> > > It does not sound like it. We do have parallel examples of periodic > migration, such as Swarm ex9. > > What happens if you turn off periodicity and just let particles fall out > of the box? Does it still crash? > > Thanks, > > Matt > > >> Best regards >> Miguel >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!flWhzJ-Vd_S7zFFxywsQXO7UelXYvJprWVsqngT5k0SlO7wMjGMHQ_D_3lQ-2_FNdN9pw-ztQ4gVIfmWX52U$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!flWhzJ-Vd_S7zFFxywsQXO7UelXYvJprWVsqngT5k0SlO7wMjGMHQ_D_3lQ-2_FNdN9pw-ztQ4gVIfmWX52U$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmolinos at us.es Sun Apr 21 09:06:26 2024 From: mmolinos at us.es (MIGUEL MOLINOS PEREZ) Date: Sun, 21 Apr 2024 14:06:26 +0000 Subject: [petsc-users] Periodic boundary conditions using swarm In-Reply-To: References: <2E956128-D31A-4D45-86B7-F1DC8F7610E7@us.es> <7AF4DE7D-D348-41E6-80D0-4AD3E2686717@us.es> Message-ID: <090232ED-A18B-427A-9813-B8656EEC41F9@us.es> Thank you Matt, I will definitely do that. Thrilled to add a humble contribution to PETSc :-) Best, Miguel On 21 Apr 2024, at 16:01, Matthew Knepley wrote: On Sun, Apr 21, 2024 at 9:38?AM MIGUEL MOLINOS PEREZ > wrote: Dear Matt, Thank you for your answer. In addition to your suggestion I solved a bug in the test (I was not updating the local size integer during the time loop). Indeed if I turn off periodicity it works. Furthermore, if I use instead DM_BOUNDARY_TWIST instead, it works too. However, if I turn on DM_BOUNDARY_PERIODIC, I got an error in the search algorithm I implemented for the domain decomposition inspired by (https://urldefense.us/v3/__https://petsc.org/main/src/dm/tutorials/swarm_ex3.c.html__;!!G_uCfscf7eWS!djMj0jtH27HHR56cKhjzc0Kd4_HcjA62WBT1w_w1KFs0TfJQOtFQLA2AexxL7g4rS8aelbmYYFAE9n3wQLSIBA$ ). The algorithm is not capable of finding some of the particles at the initial stage of the simulation (without transport). Looks like the error is on my end, however it is puzzling why it works for DM_BOUNDARY_TWIST but not for DM_BOUNDARY_PERIODIC. I usually solve these things by making a simple example. We could make another test in Swarm test ex1 that uses periodicity. If you send a draft over that fails, I can help you debug it. It would make a fantastic contribution to PETSc. Thanks, Matt Thanks, Miguel On 21 Apr 2024, at 14:53, Matthew Knepley > wrote: On Thu, Apr 18, 2024 at 8:23?AM MIGUEL MOLINOS PEREZ > wrote: ? Dear all,? I am working on the implementation of periodic bcc using a discretisation (PIC-style). I am working with a test case which consists on solving the advection of a set of particles inside of a box (DMDA mesh) with periodic bcc on ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd ? Dear all,? I am working on the implementation of periodic bcc using a discretisation (PIC-style). I am working with a test case which consists on solving the advection of a set of particles inside of a box (DMDA mesh) with periodic bcc on the x axis. My implementation updates the position of each particle with a velocity field, afterwards I check if the particle is inside, or not, the supercell (periodic box). If not, I correct the position using bcc conditions. Once this step is done, I call Dmswarmmigrate. It works in serial, but crashes in parallel with MPI (see attached nohup file). I have checked some of the Dmswarmmigrate examples, and they looks similar to my implementation. However they do not use periodic bcc. I am missing any step in addition to Dmswarmmigrate? It does not sound like it. We do have parallel examples of periodic migration, such as Swarm ex9. What happens if you turn off periodicity and just let particles fall out of the box? Does it still crash? Thanks, Matt Best regards Miguel -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!djMj0jtH27HHR56cKhjzc0Kd4_HcjA62WBT1w_w1KFs0TfJQOtFQLA2AexxL7g4rS8aelbmYYFAE9n32M1rp2g$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!djMj0jtH27HHR56cKhjzc0Kd4_HcjA62WBT1w_w1KFs0TfJQOtFQLA2AexxL7g4rS8aelbmYYFAE9n32M1rp2g$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sun Apr 21 11:29:10 2024 From: lzou at anl.gov (Zou, Ling) Date: Sun, 21 Apr 2024 16:29:10 +0000 Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition In-Reply-To: References: Message-ID: Hi Mark, I am working on a project having similar numeric you have, one-dimensional finite volume method with second-order slope limiter TVD, and PETSc finite differencing gives perfect Jacobian even for complex problems. So, I tend to believe that your implementation may have some problem. Some lessons I learned during my code development: * how do you do the coloring when using PETSc finite differencing? An incorrect coloring may give you wrong Jacobian. The simplest way to avoid an incorrect coloring is to assume the matrix is dense (slow but error proofing). * Residual function evaluation not correctly implemented can also lead to incorrect Jacobian. In your case, you may want to take a careful look at the order of execution, when to update your unknown vector, when to perform P1 reconstruction, and when to evaluate the residual. -Ling From: petsc-users on behalf of Mark Lohry Date: Saturday, April 20, 2024 at 1:35 PM To: PETSc Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, the Jacobian should be constant/independent of the solution. When I set the initial condition passed to SNESComputeJacobian as some constant, say f(x)=1 or 0, the petsc finite difference jacobian agrees with my hand coded-version. But when I pass it some non-constant value, e.g. f(x)=sin(x), something goes horribly wrong in the petsc jacobian. Implementing my own rudimentary finite difference approximation (similar to how I thought petsc computes it) it returns the correct jacobian to expected error. Any idea what could be going on? Analytically computed Jacobian: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x)=1: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x) = sin(x): -1.65547e+08 -3.31856e+08 -1.25427e+09 4.4844e+08 0 0 1.03206e+08 7.86375e+07 9.13788e+07 1.83178e+08 6.92336e+08 -2.4753e+08 0 0 -5.69678e+07 -4.34064e+07 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 9.13788e+07 1.83178e+08 0 0 -1.24151e+08 -7.38608e+07 -5.69678e+07 -4.34064e+07 -1.65547e+08 -3.31856e+08 0 0 2.24919e+08 1.3381e+08 1.03206e+08 7.86375e+07 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sun Apr 21 11:36:45 2024 From: lzou at anl.gov (Zou, Ling) Date: Sun, 21 Apr 2024 16:36:45 +0000 Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition In-Reply-To: References: Message-ID: Edit: * how do you do the coloring when using PETSc finite differencing? An incorrect coloring may give you wrong Jacobian. For debugging purpose, the simplest way to avoid an incorrect coloring is to assume the matrix is dense (slow but error proofing). If the numeric converges as expected, then fine tune your coloring to make it right and fast. From: petsc-users on behalf of Zou, Ling via petsc-users Date: Sunday, April 21, 2024 at 11:29 AM To: Mark Lohry , PETSc Subject: Re: [petsc-users] finite difference jacobian errors when given non-constant initial condition Hi Mark, I am working on a project having similar numeric you have, one-dimensional finite volume method with second-order slope limiter TVD, and PETSc finite differencing gives perfect Jacobian even for complex problems. So, I tend to believe that your implementation may have some problem. Some lessons I learned during my code development: * how do you do the coloring when using PETSc finite differencing? An incorrect coloring may give you wrong Jacobian. The simplest way to avoid an incorrect coloring is to assume the matrix is dense (slow but error proofing). * Residual function evaluation not correctly implemented can also lead to incorrect Jacobian. In your case, you may want to take a careful look at the order of execution, when to update your unknown vector, when to perform P1 reconstruction, and when to evaluate the residual. -Ling From: petsc-users on behalf of Mark Lohry Date: Saturday, April 20, 2024 at 1:35 PM To: PETSc Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, the Jacobian should be constant/independent of the solution. When I set the initial condition passed to SNESComputeJacobian as some constant, say f(x)=1 or 0, the petsc finite difference jacobian agrees with my hand coded-version. But when I pass it some non-constant value, e.g. f(x)=sin(x), something goes horribly wrong in the petsc jacobian. Implementing my own rudimentary finite difference approximation (similar to how I thought petsc computes it) it returns the correct jacobian to expected error. Any idea what could be going on? Analytically computed Jacobian: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x)=1: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x) = sin(x): -1.65547e+08 -3.31856e+08 -1.25427e+09 4.4844e+08 0 0 1.03206e+08 7.86375e+07 9.13788e+07 1.83178e+08 6.92336e+08 -2.4753e+08 0 0 -5.69678e+07 -4.34064e+07 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 9.13788e+07 1.83178e+08 0 0 -1.24151e+08 -7.38608e+07 -5.69678e+07 -4.34064e+07 -1.65547e+08 -3.31856e+08 0 0 2.24919e+08 1.3381e+08 1.03206e+08 7.86375e+07 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Sun Apr 21 12:34:54 2024 From: mlohry at gmail.com (Mark Lohry) Date: Sun, 21 Apr 2024 13:34:54 -0400 Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition In-Reply-To: References: Message-ID: The coloring I'm fairly confident is correct -- I use the same process for 3D unstructured grids and everything seems to work. The residual function is also validated. As a test I did as you suggested -- assume the matrix is dense -- and I get the same bad results, just now the zero blocks are filled. Assuming dense, giving it a constant vector, all is good: 4.23516e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.23516e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 2.11758e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.23516e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 2.11758e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.23516e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.23516e-16 -1.10266 Assuming dense, giving it sin(x), all is bad: -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 Scratching my head over here... I've been using these routines successfully for years in much more complex code. On Sun, Apr 21, 2024 at 12:36?PM Zou, Ling wrote: > Edit: > > - how do you do the coloring when using PETSc finite differencing? An > incorrect coloring may give you wrong Jacobian. For debugging purpose, > the simplest way to avoid an incorrect coloring is to assume the matrix is > dense (slow but error proofing). If the numeric converges as expected, > then fine tune your coloring to make it right and fast. > > > > > > *From: *petsc-users on behalf of Zou, > Ling via petsc-users > *Date: *Sunday, April 21, 2024 at 11:29 AM > *To: *Mark Lohry , PETSc > *Subject: *Re: [petsc-users] finite difference jacobian errors when given > non-constant initial condition > > Hi Mark, I am working on a project having similar numeric you have, > one-dimensional finite volume method with second-order slope limiter TVD, > and PETSc finite differencing gives perfect Jacobian even for complex > problems. > > So, I tend to believe that your implementation may have some problem. Some > lessons I learned during my code development: > > > > - how do you do the coloring when using PETSc finite differencing? An > incorrect coloring may give you wrong Jacobian. The simplest way to avoid > an incorrect coloring is to assume the matrix is dense (slow but error > proofing). > - Residual function evaluation not correctly implemented can also lead > to incorrect Jacobian. In your case, you may want to take a careful look at > the order of execution, when to update your unknown vector, when to perform > P1 reconstruction, and when to evaluate the residual. > > > > -Ling > > > > *From: *petsc-users on behalf of Mark > Lohry > *Date: *Saturday, April 20, 2024 at 1:35 PM > *To: *PETSc > *Subject: *[petsc-users] finite difference jacobian errors when given > non-constant initial condition > > I have a 1-dimensional P1 discontinuous Galerkin discretization of the > linear advection equation with 4 cells and periodic boundaries on > [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a > hand-written Jacobian. Being linear, > > ZjQcmQRYFpfptBannerStart > > *This Message Is From an External Sender * > > This message came from outside your organization. > > > > ZjQcmQRYFpfptBannerEnd > > I have a 1-dimensional P1 discontinuous Galerkin discretization of the > linear advection equation with 4 cells and periodic boundaries on > [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a > hand-written Jacobian. Being linear, the Jacobian should be > constant/independent of the solution. > > > > When I set the initial condition passed to SNESComputeJacobian as some > constant, say f(x)=1 or 0, the petsc finite difference jacobian agrees with > my hand coded-version. But when I pass it some non-constant value, e.g. > f(x)=sin(x), something goes horribly wrong in the petsc jacobian. > Implementing my own rudimentary finite difference approximation (similar to > how I thought petsc computes it) it returns the correct jacobian to > expected error. Any idea what could be going on? > > > > Analytically computed Jacobian: > > 4.44089e-16 -1.10266 0.31831 -0.0852909 0 > 0 -0.31831 1.18795 > 1.10266 -4.44089e-16 -1.18795 0.31831 0 > 0 0.0852909 -0.31831 > -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 > -0.0852909 0 0 > 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 > 0.31831 0 0 > 0 0 -0.31831 1.18795 4.44089e-16 > -1.10266 0.31831 -0.0852909 > 0 0 0.0852909 -0.31831 1.10266 > -4.44089e-16 -1.18795 0.31831 > 0.31831 -0.0852909 0 0 -0.31831 > 1.18795 4.44089e-16 -1.10266 > -1.18795 0.31831 0 0 0.0852909 > -0.31831 1.10266 -4.44089e-16 > > > > > > petsc finite difference jacobian when given f(x)=1: > > 4.44089e-16 -1.10266 0.31831 -0.0852909 0 > 0 -0.31831 1.18795 > 1.10266 -4.44089e-16 -1.18795 0.31831 0 > 0 0.0852909 -0.31831 > -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 > -0.0852909 0 0 > 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 > 0.31831 0 0 > 0 0 -0.31831 1.18795 4.44089e-16 > -1.10266 0.31831 -0.0852909 > 0 0 0.0852909 -0.31831 1.10266 > -4.44089e-16 -1.18795 0.31831 > 0.31831 -0.0852909 0 0 -0.31831 > 1.18795 4.44089e-16 -1.10266 > -1.18795 0.31831 0 0 0.0852909 > -0.31831 1.10266 -4.44089e-16 > > > > petsc finite difference jacobian when given f(x) = sin(x): > > -1.65547e+08 -3.31856e+08 -1.25427e+09 4.4844e+08 0 > 0 1.03206e+08 7.86375e+07 > 9.13788e+07 1.83178e+08 6.92336e+08 -2.4753e+08 0 > 0 -5.69678e+07 -4.34064e+07 > 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 > -2.99747e+07 0 0 > 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 > -2.99747e+07 0 0 > 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 > -2.99747e+07 -2.31191e+07 -1.76155e+07 > 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 > -2.99747e+07 -2.31191e+07 -1.76155e+07 > 9.13788e+07 1.83178e+08 0 0 -1.24151e+08 > -7.38608e+07 -5.69678e+07 -4.34064e+07 > -1.65547e+08 -3.31856e+08 0 0 2.24919e+08 > 1.3381e+08 1.03206e+08 7.86375e+07 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sun Apr 21 14:28:25 2024 From: lzou at anl.gov (Zou, Ling) Date: Sun, 21 Apr 2024 19:28:25 +0000 Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition In-Reply-To: References: Message-ID: Very interesting. I happened to encounter something very similar a couple of days ago, which, of course, was due to a code bug I introduced. The code bug was in the residual function. I used a local vector to track ?heat flux?, which should be zero-ed out at the beginning of each residual function evaluation. I did not zero it, and I observed very similar results, the Jacobian is completely wrong, with large values (J_ij keeps increasing after each iteration), and non-zero values are observed in locations which should be perfect zero. The symptom is very much like what you are seeing here. I suspect a similar bug. (Maybe you forgot zero the coefficients of P1 re-construction? Using constant value 1, reconstructed dphi/dx = 0, so however many iterations, still zero). -Ling From: Mark Lohry Date: Sunday, April 21, 2024 at 12:35 PM To: Zou, Ling Cc: PETSc Subject: Re: [petsc-users] finite difference jacobian errors when given non-constant initial condition The coloring I'm fairly confident is correct -- I use the same process for 3D unstructured grids and everything seems to work. The residual function is also validated. As a test I did as you suggested -- assume the matrix is dense -- and ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd The coloring I'm fairly confident is correct -- I use the same process for 3D unstructured grids and everything seems to work. The residual function is also validated. As a test I did as you suggested -- assume the matrix is dense -- and I get the same bad results, just now the zero blocks are filled. Assuming dense, giving it a constant vector, all is good: 4.23516e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.23516e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 2.11758e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.23516e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 2.11758e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.23516e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.23516e-16 -1.10266 Assuming dense, giving it sin(x), all is bad: -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 Scratching my head over here... I've been using these routines successfully for years in much more complex code. On Sun, Apr 21, 2024 at 12:36?PM Zou, Ling > wrote: Edit: * how do you do the coloring when using PETSc finite differencing? An incorrect coloring may give you wrong Jacobian. For debugging purpose, the simplest way to avoid an incorrect coloring is to assume the matrix is dense (slow but error proofing). If the numeric converges as expected, then fine tune your coloring to make it right and fast. From: petsc-users > on behalf of Zou, Ling via petsc-users > Date: Sunday, April 21, 2024 at 11:29 AM To: Mark Lohry >, PETSc > Subject: Re: [petsc-users] finite difference jacobian errors when given non-constant initial condition Hi Mark, I am working on a project having similar numeric you have, one-dimensional finite volume method with second-order slope limiter TVD, and PETSc finite differencing gives perfect Jacobian even for complex problems. So, I tend to believe that your implementation may have some problem. Some lessons I learned during my code development: * how do you do the coloring when using PETSc finite differencing? An incorrect coloring may give you wrong Jacobian. The simplest way to avoid an incorrect coloring is to assume the matrix is dense (slow but error proofing). * Residual function evaluation not correctly implemented can also lead to incorrect Jacobian. In your case, you may want to take a careful look at the order of execution, when to update your unknown vector, when to perform P1 reconstruction, and when to evaluate the residual. -Ling From: petsc-users > on behalf of Mark Lohry > Date: Saturday, April 20, 2024 at 1:35 PM To: PETSc > Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, the Jacobian should be constant/independent of the solution. When I set the initial condition passed to SNESComputeJacobian as some constant, say f(x)=1 or 0, the petsc finite difference jacobian agrees with my hand coded-version. But when I pass it some non-constant value, e.g. f(x)=sin(x), something goes horribly wrong in the petsc jacobian. Implementing my own rudimentary finite difference approximation (similar to how I thought petsc computes it) it returns the correct jacobian to expected error. Any idea what could be going on? Analytically computed Jacobian: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x)=1: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x) = sin(x): -1.65547e+08 -3.31856e+08 -1.25427e+09 4.4844e+08 0 0 1.03206e+08 7.86375e+07 9.13788e+07 1.83178e+08 6.92336e+08 -2.4753e+08 0 0 -5.69678e+07 -4.34064e+07 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 9.13788e+07 1.83178e+08 0 0 -1.24151e+08 -7.38608e+07 -5.69678e+07 -4.34064e+07 -1.65547e+08 -3.31856e+08 0 0 2.24919e+08 1.3381e+08 1.03206e+08 7.86375e+07 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzou at anl.gov Sun Apr 21 14:36:55 2024 From: lzou at anl.gov (Zou, Ling) Date: Sun, 21 Apr 2024 19:36:55 +0000 Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition In-Reply-To: References: Message-ID: The other symptom is the same: * Using coloring, finite differencing respects the specified non-zero pattern, but gives wrong (very large) Jacobian entries (J_ij) * Using dense matrix assumption, finite differencing does not respect the non-zero pattern determined by your numeric, which is a clear sign of residual function code bug (your residual function does not respect your numeric). -Ling From: petsc-users on behalf of Zou, Ling via petsc-users Date: Sunday, April 21, 2024 at 2:28 PM To: Mark Lohry Cc: PETSc Subject: Re: [petsc-users] finite difference jacobian errors when given non-constant initial condition Very interesting. I happened to encounter something very similar a couple of days ago, which, of course, was due to a code bug I introduced. The code bug was in the residual function. I used a local vector to track ?heat flux?, which should be zero-ed out at the beginning of each residual function evaluation. I did not zero it, and I observed very similar results, the Jacobian is completely wrong, with large values (J_ij keeps increasing after each iteration), and non-zero values are observed in locations which should be perfect zero. The symptom is very much like what you are seeing here. I suspect a similar bug. (Maybe you forgot zero the coefficients of P1 re-construction? Using constant value 1, reconstructed dphi/dx = 0, so however many iterations, still zero). -Ling From: Mark Lohry Date: Sunday, April 21, 2024 at 12:35 PM To: Zou, Ling Cc: PETSc Subject: Re: [petsc-users] finite difference jacobian errors when given non-constant initial condition The coloring I'm fairly confident is correct -- I use the same process for 3D unstructured grids and everything seems to work. The residual function is also validated. As a test I did as you suggested -- assume the matrix is dense -- and ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd The coloring I'm fairly confident is correct -- I use the same process for 3D unstructured grids and everything seems to work. The residual function is also validated. As a test I did as you suggested -- assume the matrix is dense -- and I get the same bad results, just now the zero blocks are filled. Assuming dense, giving it a constant vector, all is good: 4.23516e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.23516e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 2.11758e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.23516e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 2.11758e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.23516e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.23516e-16 -1.10266 Assuming dense, giving it sin(x), all is bad: -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 -1.31161e+08 -4.52116e+07 -4.52116e+07 -1.31161e+08 1.31161e+08 4.52116e+07 4.52116e+07 1.31161e+08 -1.76177e+08 -6.07287e+07 -6.07287e+07 -1.76177e+08 1.76177e+08 6.07287e+07 6.07287e+07 1.76177e+08 Scratching my head over here... I've been using these routines successfully for years in much more complex code. On Sun, Apr 21, 2024 at 12:36?PM Zou, Ling > wrote: Edit: * how do you do the coloring when using PETSc finite differencing? An incorrect coloring may give you wrong Jacobian. For debugging purpose, the simplest way to avoid an incorrect coloring is to assume the matrix is dense (slow but error proofing). If the numeric converges as expected, then fine tune your coloring to make it right and fast. From: petsc-users > on behalf of Zou, Ling via petsc-users > Date: Sunday, April 21, 2024 at 11:29 AM To: Mark Lohry >, PETSc > Subject: Re: [petsc-users] finite difference jacobian errors when given non-constant initial condition Hi Mark, I am working on a project having similar numeric you have, one-dimensional finite volume method with second-order slope limiter TVD, and PETSc finite differencing gives perfect Jacobian even for complex problems. So, I tend to believe that your implementation may have some problem. Some lessons I learned during my code development: * how do you do the coloring when using PETSc finite differencing? An incorrect coloring may give you wrong Jacobian. The simplest way to avoid an incorrect coloring is to assume the matrix is dense (slow but error proofing). * Residual function evaluation not correctly implemented can also lead to incorrect Jacobian. In your case, you may want to take a careful look at the order of execution, when to update your unknown vector, when to perform P1 reconstruction, and when to evaluate the residual. -Ling From: petsc-users > on behalf of Mark Lohry > Date: Saturday, April 20, 2024 at 1:35 PM To: PETSc > Subject: [petsc-users] finite difference jacobian errors when given non-constant initial condition I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd I have a 1-dimensional P1 discontinuous Galerkin discretization of the linear advection equation with 4 cells and periodic boundaries on [-pi,+pi]. I'm comparing the results from SNESComputeJacobian with a hand-written Jacobian. Being linear, the Jacobian should be constant/independent of the solution. When I set the initial condition passed to SNESComputeJacobian as some constant, say f(x)=1 or 0, the petsc finite difference jacobian agrees with my hand coded-version. But when I pass it some non-constant value, e.g. f(x)=sin(x), something goes horribly wrong in the petsc jacobian. Implementing my own rudimentary finite difference approximation (similar to how I thought petsc computes it) it returns the correct jacobian to expected error. Any idea what could be going on? Analytically computed Jacobian: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x)=1: 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 -0.31831 1.18795 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0.0852909 -0.31831 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0 0 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 0.31831 -0.0852909 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 -1.18795 0.31831 0.31831 -0.0852909 0 0 -0.31831 1.18795 4.44089e-16 -1.10266 -1.18795 0.31831 0 0 0.0852909 -0.31831 1.10266 -4.44089e-16 petsc finite difference jacobian when given f(x) = sin(x): -1.65547e+08 -3.31856e+08 -1.25427e+09 4.4844e+08 0 0 1.03206e+08 7.86375e+07 9.13788e+07 1.83178e+08 6.92336e+08 -2.4753e+08 0 0 -5.69678e+07 -4.34064e+07 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 3.7084e+07 7.43387e+07 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 0 0 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 0 0 2.80969e+08 -1.00455e+08 -5.0384e+07 -2.99747e+07 -2.31191e+07 -1.76155e+07 9.13788e+07 1.83178e+08 0 0 -1.24151e+08 -7.38608e+07 -5.69678e+07 -4.34064e+07 -1.65547e+08 -3.31856e+08 0 0 2.24919e+08 1.3381e+08 1.03206e+08 7.86375e+07 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Apr 22 15:19:54 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 22 Apr 2024 16:19:54 -0400 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: Message-ID: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> PETSc provided solvers do not directly use threads. The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. > On Apr 22, 2024, at 4:06?PM, Yongzhong Li wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hello all, > > I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. > > The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. > > Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? > > For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp > > To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: > > #include > > /* Monitor function to print iteration number and residual norm */ > PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { > PetscErrorCode ierr; > ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); > CHKERRQ(ierr); > return 0; > } > > int main(int argc, char **args) { > Vec x, b, x_true, e; > Mat A; > KSP ksp; > PetscErrorCode ierr; > PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n > PetscInt Istart, Iend, ncols; > PetscScalar v; > PetscMPIInt rank; > PetscInitialize(&argc, &args, NULL, NULL); > PetscLogDouble t1, t2; // Variables for timing > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > // Create vectors and matrix > ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); > ierr = VecDuplicate(x, &b); CHKERRQ(ierr); > ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); > > // Set true solution as all ones > ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); > > // Create and assemble matrix A for the 2D Laplacian using 5-point stencil > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); > ierr = MatSetFromOptions(A); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); > for (Ii = Istart; Ii < Iend; Ii++) { > i = Ii / n; // Row index > j = Ii % n; // Column index > v = -4.0; > ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); > if (i > 0) { // South > J = Ii - n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (i < n - 1) { // North > J = Ii + n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j > 0) { // West > J = Ii - 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j < n - 1) { // East > J = Ii + 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > } > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > // Compute the RHS corresponding to the true solution > ierr = MatMult(A, x_true, b); CHKERRQ(ierr); > > // Set up and solve the linear system > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); > ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); > > /* Set up the monitor */ > ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); > > // Start timing > PetscTime(&t1); > > ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > // Stop timing > PetscTime(&t2); > > // Compute error > ierr = VecDuplicate(x, &e); CHKERRQ(ierr); > ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); > PetscReal norm_error, norm_true; > ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); > ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); > PetscReal relative_error = norm_error / norm_true; > if (rank == 0) { // Print only from the first MPI process > PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); > } > > // Output the wall time taken for MatMult > PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); > > // Cleanup > ierr = VecDestroy(&x); CHKERRQ(ierr); > ierr = VecDestroy(&b); CHKERRQ(ierr); > ierr = VecDestroy(&x_true); CHKERRQ(ierr); > ierr = VecDestroy(&e); CHKERRQ(ierr); > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = KSPDestroy(&ksp); CHKERRQ(ierr); > PetscFinalize(); > return 0; > } > > Here are some profiling results for GMERS solution. > > OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 > OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 > OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 > OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 > OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 > > I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. > > Thank you in advance! > > Sincerely, > Yongzhong > > ----------------------------------------------------------- > Yongzhong Li > PhD student | Electromagnetics Group > Department of Electrical & Computer Engineering > University of Toronto > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!b7a3fGWAWaAfipIVdyC00gJjtfXWAyBg88kMFF39JD7pOYzY9yZi2J0Rnjf41f3GrdIIe4HltuF9P1xweNtROvc$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yongzhong.li at mail.utoronto.ca Mon Apr 22 15:06:28 2024 From: yongzhong.li at mail.utoronto.ca (Yongzhong Li) Date: Mon, 22 Apr 2024 20:06:28 +0000 Subject: [petsc-users] Inquiry about Multithreading Capabilities in PETSc's KSPSolver Message-ID: Hello all, I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: #include /* Monitor function to print iteration number and residual norm */ PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { PetscErrorCode ierr; ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); CHKERRQ(ierr); return 0; } int main(int argc, char **args) { Vec x, b, x_true, e; Mat A; KSP ksp; PetscErrorCode ierr; PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n PetscInt Istart, Iend, ncols; PetscScalar v; PetscMPIInt rank; PetscInitialize(&argc, &args, NULL, NULL); PetscLogDouble t1, t2; // Variables for timing MPI_Comm_rank(PETSC_COMM_WORLD, &rank); // Create vectors and matrix ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); ierr = VecDuplicate(x, &b); CHKERRQ(ierr); ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); // Set true solution as all ones ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); // Create and assemble matrix A for the 2D Laplacian using 5-point stencil ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); ierr = MatSetFromOptions(A); CHKERRQ(ierr); ierr = MatSetUp(A); CHKERRQ(ierr); ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); for (Ii = Istart; Ii < Iend; Ii++) { i = Ii / n; // Row index j = Ii % n; // Column index v = -4.0; ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); if (i > 0) { // South J = Ii - n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (i < n - 1) { // North J = Ii + n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j > 0) { // West J = Ii - 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j < n - 1) { // East J = Ii + 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } } ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); // Compute the RHS corresponding to the true solution ierr = MatMult(A, x_true, b); CHKERRQ(ierr); // Set up and solve the linear system ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); /* Set up the monitor */ ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); // Start timing PetscTime(&t1); ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); // Stop timing PetscTime(&t2); // Compute error ierr = VecDuplicate(x, &e); CHKERRQ(ierr); ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); PetscReal norm_error, norm_true; ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); PetscReal relative_error = norm_error / norm_true; if (rank == 0) { // Print only from the first MPI process PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); } // Output the wall time taken for MatMult PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); // Cleanup ierr = VecDestroy(&x); CHKERRQ(ierr); ierr = VecDestroy(&b); CHKERRQ(ierr); ierr = VecDestroy(&x_true); CHKERRQ(ierr); ierr = VecDestroy(&e); CHKERRQ(ierr); ierr = MatDestroy(&A); CHKERRQ(ierr); ierr = KSPDestroy(&ksp); CHKERRQ(ierr); PetscFinalize(); return 0; } Here are some profiling results for GMERS solution. OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. Thank you in advance! Sincerely, Yongzhong ----------------------------------------------------------- Yongzhong Li PhD student | Electromagnetics Group Department of Electrical & Computer Engineering University of Toronto https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!dke3wyns7hkBOVhDpSLeiXntYbY-XnVkGEyNyesUI7XkVcBQe_oSaykAmlN6EZ_B9P4mbm-aP1RKw2FwERcONEC88CYksHvjUZ0$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.salazar at corintis.com Tue Apr 23 11:13:08 2024 From: miguel.salazar at corintis.com (Miguel Angel Salazar de Troya) Date: Tue, 23 Apr 2024 18:13:08 +0200 Subject: [petsc-users] Parallelism of the Mat.convert() function Message-ID: Hello, The following code returns a different answer depending on how many processors I use. With one processor, the last MPIAIJ matrix is correctly formed: row 0: (0, 1.) (1, 2.) (2, 3.) (3, 4.) (4, -1.) row 1: (0, 5.) (1, 6.) (2, 7.) (3, 8.) (4, -2.) row 2: (0, 9.) (1, 10.) (2, 11.) (3, 12.) (4, -3.) row 3: (0, 13.) (1, 14.) (2, 15.) (3, 16.) (4, -4.) row 4: (0, 17.) (1, 18.) (2, 19.) (3, 20.) (4, -5.) row 5: (0, 21.) (1, 22.) (2, 23.) (3, 24.) (4, -6.) row 6: (0, 25.) (1, 26.) (2, 27.) (3, 28.) (4, -7.) With two processors though, the column matrix is placed in between: row 0: (0, 1.) (1, 2.) (2, -1.) (3, 3.) (4, 4.) row 1: (0, 5.) (1, 6.) (2, -2.) (3, 7.) (4, 8.) row 2: (0, 9.) (1, 10.) (2, -3.) (3, 11.) (4, 12.) row 3: (0, 13.) (1, 14.) (2, -4.) (3, 15.) (4, 16.) row 4: (0, 17.) (1, 18.) (2, -5.) (3, 19.) (4, 20.) row 5: (0, 21.) (1, 22.) (2, -6.) (3, 23.) (4, 24.) row 6: (0, 25.) (1, 26.) (2, -7.) (3, 27.) (4, 28.) Am I not building the nested matrix correctly, perhaps? I am using the Firedrake PETSc fork. Can you reproduce it? Thanks, Miguel ```python import numpy as np from petsc4py import PETSc from petsc4py.PETSc import COMM_WORLD input_array = np.array( [ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24], [25, 26, 27, 28], ], dtype=np.float64, ) n_11_global_rows, n_11_global_cols = input_array.shape size = ((None, n_11_global_rows), (None, n_11_global_cols)) mat = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) mat.setUp() mat.setValues(range(n_11_global_rows), range(n_11_global_cols), input_array) mat.assemblyBegin() mat.assemblyEnd() mat.view() input_array = np.array([[-1, -2, -3, -4, -5, -6, -7]], dtype=np.float64) global_rows, global_cols = input_array.T.shape size = ((None, global_rows), (None, global_cols)) mat_2 = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) mat_2.setUp() mat_2.setValues(range(global_rows), range(global_cols), input_array) mat_2.assemblyBegin() mat_2.assemblyEnd() N = PETSc.Mat().createNest([[mat, mat_2]], comm=COMM_WORLD) N.assemblyBegin() N.assemblyEnd() PETSc.Sys.Print(f"N sizes: {N.getSize()}") N.convert("mpiaij").view() ``` -- * Miguel Angel Salazar de Troya * Head of Software Engineering EPFL Innovation Park Building C 1015 Lausanne Email: miguel.salazar at corintis.com Website: https://urldefense.us/v3/__http://www.corintis.com__;!!G_uCfscf7eWS!bC7zvBxQx0RuDXxzlOgxr_PdSp5N9ZdzjgTPmjG_ZU5WbNvHboZHFBhZksYgyDF2nO1IRXABTx5zmJLaL2NK_EYg2deym84H$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at joliv.et Tue Apr 23 11:44:16 2024 From: pierre at joliv.et (Pierre Jolivet) Date: Tue, 23 Apr 2024 18:44:16 +0200 Subject: [petsc-users] Parallelism of the Mat.convert() function In-Reply-To: References: Message-ID: <29C0CE2A-3E00-4D25-A299-7E17478E6FBA@joliv.et> The code is behaving as it should, IMHO. Here is a way to have the Mat stored the same independently of the number of processes. [?] global_rows, global_cols = input_array.T.shape size = ((None, global_rows), (0 if COMM_WORLD.Get_rank() < COMM_WORLD.Get_size() - 1 else global_cols, global_cols)) [?] I.e., you want to enforce that the lone column is stored by the last process, otherwise, it will be stored by the first one, and interleaved with the rest of the (0,0) block. Thanks, Pierre > On 23 Apr 2024, at 6:13?PM, Miguel Angel Salazar de Troya wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hello, > > The following code returns a different answer depending on how many processors I use. With one processor, the last MPIAIJ matrix is correctly formed: > > row 0: (0, 1.) (1, 2.) (2, 3.) (3, 4.) (4, -1.) > row 1: (0, 5.) (1, 6.) (2, 7.) (3, 8.) (4, -2.) > row 2: (0, 9.) (1, 10.) (2, 11.) (3, 12.) (4, -3.) > row 3: (0, 13.) (1, 14.) (2, 15.) (3, 16.) (4, -4.) > row 4: (0, 17.) (1, 18.) (2, 19.) (3, 20.) (4, -5.) > row 5: (0, 21.) (1, 22.) (2, 23.) (3, 24.) (4, -6.) > row 6: (0, 25.) (1, 26.) (2, 27.) (3, 28.) (4, -7.) > > With two processors though, the column matrix is placed in between: > > row 0: (0, 1.) (1, 2.) (2, -1.) (3, 3.) (4, 4.) > row 1: (0, 5.) (1, 6.) (2, -2.) (3, 7.) (4, 8.) > row 2: (0, 9.) (1, 10.) (2, -3.) (3, 11.) (4, 12.) > row 3: (0, 13.) (1, 14.) (2, -4.) (3, 15.) (4, 16.) > row 4: (0, 17.) (1, 18.) (2, -5.) (3, 19.) (4, 20.) > row 5: (0, 21.) (1, 22.) (2, -6.) (3, 23.) (4, 24.) > row 6: (0, 25.) (1, 26.) (2, -7.) (3, 27.) (4, 28.) > > Am I not building the nested matrix correctly, perhaps? I am using the Firedrake PETSc fork. Can you reproduce it? > > Thanks, > Miguel > > > ```python > import numpy as np > from petsc4py import PETSc > from petsc4py.PETSc import COMM_WORLD > > > input_array = np.array( > [ > [1, 2, 3, 4], > [5, 6, 7, 8], > [9, 10, 11, 12], > [13, 14, 15, 16], > [17, 18, 19, 20], > [21, 22, 23, 24], > [25, 26, 27, 28], > ], > dtype=np.float64, > ) > > > n_11_global_rows, n_11_global_cols = input_array.shape > size = ((None, n_11_global_rows), (None, n_11_global_cols)) > mat = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > mat.setUp() > mat.setValues(range(n_11_global_rows), range(n_11_global_cols), input_array) > mat.assemblyBegin() > mat.assemblyEnd() > mat.view() > > input_array = np.array([[-1, -2, -3, -4, -5, -6, -7]], dtype=np.float64) > global_rows, global_cols = input_array.T.shape > size = ((None, global_rows), (None, global_cols)) > mat_2 = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > mat_2.setUp() > mat_2.setValues(range(global_rows), range(global_cols), input_array) > mat_2.assemblyBegin() > mat_2.assemblyEnd() > > N = PETSc.Mat().createNest([[mat, mat_2]], comm=COMM_WORLD) > N.assemblyBegin() > N.assemblyEnd() > > PETSc.Sys.Print(f"N sizes: {N.getSize()}") > > N.convert("mpiaij").view() > ``` > > > -- > > Miguel Angel Salazar de Troya > Head of Software Engineering > > > EPFL Innovation Park Building C > 1015 Lausanne > Email: miguel.salazar at corintis.com? > Website: https://urldefense.us/v3/__http://www.corintis.com__;!!G_uCfscf7eWS!YORvPf_X3sveJywRYWyVNAFZ4TMboGzB8VrtDIonGAbCuwi-L_km8zcUwHsovFRYTKz6sEij9ppa0U9uf4u2Ew$ > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yongzhong.li at mail.utoronto.ca Tue Apr 23 14:23:59 2024 From: yongzhong.li at mail.utoronto.ca (Yongzhong Li) Date: Tue, 23 Apr 2024 19:23:59 +0000 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> Message-ID: Hi Barry, Thank you for the information provided! Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? Best, Yongzhong From: Barry Smith Date: Monday, April 22, 2024 at 4:20?PM To: Yongzhong Li Cc: petsc-users at mcs.anl.gov , petsc-maint at mcs.anl.gov , Piero Triverio Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver ????????? bsmith at petsc.dev ????????????????? PETSc provided solvers do not directly use threads. The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. On Apr 22, 2024, at 4:06?PM, Yongzhong Li wrote: This Message Is From an External Sender This message came from outside your organization. Hello all, I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: #include /* Monitor function to print iteration number and residual norm */ PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { PetscErrorCode ierr; ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); CHKERRQ(ierr); return 0; } int main(int argc, char **args) { Vec x, b, x_true, e; Mat A; KSP ksp; PetscErrorCode ierr; PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n PetscInt Istart, Iend, ncols; PetscScalar v; PetscMPIInt rank; PetscInitialize(&argc, &args, NULL, NULL); PetscLogDouble t1, t2; // Variables for timing MPI_Comm_rank(PETSC_COMM_WORLD, &rank); // Create vectors and matrix ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); ierr = VecDuplicate(x, &b); CHKERRQ(ierr); ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); // Set true solution as all ones ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); // Create and assemble matrix A for the 2D Laplacian using 5-point stencil ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); ierr = MatSetFromOptions(A); CHKERRQ(ierr); ierr = MatSetUp(A); CHKERRQ(ierr); ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); for (Ii = Istart; Ii < Iend; Ii++) { i = Ii / n; // Row index j = Ii % n; // Column index v = -4.0; ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); if (i > 0) { // South J = Ii - n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (i < n - 1) { // North J = Ii + n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j > 0) { // West J = Ii - 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j < n - 1) { // East J = Ii + 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } } ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); // Compute the RHS corresponding to the true solution ierr = MatMult(A, x_true, b); CHKERRQ(ierr); // Set up and solve the linear system ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); /* Set up the monitor */ ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); // Start timing PetscTime(&t1); ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); // Stop timing PetscTime(&t2); // Compute error ierr = VecDuplicate(x, &e); CHKERRQ(ierr); ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); PetscReal norm_error, norm_true; ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); PetscReal relative_error = norm_error / norm_true; if (rank == 0) { // Print only from the first MPI process PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); } // Output the wall time taken for MatMult PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); // Cleanup ierr = VecDestroy(&x); CHKERRQ(ierr); ierr = VecDestroy(&b); CHKERRQ(ierr); ierr = VecDestroy(&x_true); CHKERRQ(ierr); ierr = VecDestroy(&e); CHKERRQ(ierr); ierr = MatDestroy(&A); CHKERRQ(ierr); ierr = KSPDestroy(&ksp); CHKERRQ(ierr); PetscFinalize(); return 0; } Here are some profiling results for GMERS solution. OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. Thank you in advance! Sincerely, Yongzhong ----------------------------------------------------------- Yongzhong Li PhD student | Electromagnetics Group Department of Electrical & Computer Engineering University of Toronto https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!aNwwhaNwa_gbiduyqOK-fh3XqaflSgzf_Epel-lCvCrQrOx_5whj-_fjlBOTZsR-8DKl0ZHsC7L78nIw1YvDUq9dQO1DoUi7Awk$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Apr 23 14:34:50 2024 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 23 Apr 2024 15:34:50 -0400 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> Message-ID: <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations. > On Apr 23, 2024, at 3:23?PM, Yongzhong Li wrote: > > Hi Barry, > > Thank you for the information provided! > > Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? > > I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? > > Best, > Yongzhong > > From: Barry Smith > > Date: Monday, April 22, 2024 at 4:20?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > PETSc provided solvers do not directly use threads. > > The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. > > Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. > > Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. > > If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. > > > > > On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hello all, > > I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. > > The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. > > Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? > > For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp > > To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: > > #include > > /* Monitor function to print iteration number and residual norm */ > PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { > PetscErrorCode ierr; > ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); > CHKERRQ(ierr); > return 0; > } > > int main(int argc, char **args) { > Vec x, b, x_true, e; > Mat A; > KSP ksp; > PetscErrorCode ierr; > PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n > PetscInt Istart, Iend, ncols; > PetscScalar v; > PetscMPIInt rank; > PetscInitialize(&argc, &args, NULL, NULL); > PetscLogDouble t1, t2; // Variables for timing > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > // Create vectors and matrix > ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); > ierr = VecDuplicate(x, &b); CHKERRQ(ierr); > ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); > > // Set true solution as all ones > ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); > > // Create and assemble matrix A for the 2D Laplacian using 5-point stencil > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); > ierr = MatSetFromOptions(A); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); > for (Ii = Istart; Ii < Iend; Ii++) { > i = Ii / n; // Row index > j = Ii % n; // Column index > v = -4.0; > ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); > if (i > 0) { // South > J = Ii - n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (i < n - 1) { // North > J = Ii + n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j > 0) { // West > J = Ii - 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j < n - 1) { // East > J = Ii + 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > } > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > // Compute the RHS corresponding to the true solution > ierr = MatMult(A, x_true, b); CHKERRQ(ierr); > > // Set up and solve the linear system > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); > ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); > > /* Set up the monitor */ > ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); > > // Start timing > PetscTime(&t1); > > ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > // Stop timing > PetscTime(&t2); > > // Compute error > ierr = VecDuplicate(x, &e); CHKERRQ(ierr); > ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); > PetscReal norm_error, norm_true; > ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); > ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); > PetscReal relative_error = norm_error / norm_true; > if (rank == 0) { // Print only from the first MPI process > PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); > } > > // Output the wall time taken for MatMult > PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); > > // Cleanup > ierr = VecDestroy(&x); CHKERRQ(ierr); > ierr = VecDestroy(&b); CHKERRQ(ierr); > ierr = VecDestroy(&x_true); CHKERRQ(ierr); > ierr = VecDestroy(&e); CHKERRQ(ierr); > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = KSPDestroy(&ksp); CHKERRQ(ierr); > PetscFinalize(); > return 0; > } > > Here are some profiling results for GMERS solution. > > OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 > OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 > OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 > OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 > OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 > > I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. > > Thank you in advance! > > Sincerely, > Yongzhong > > ----------------------------------------------------------- > Yongzhong Li > PhD student | Electromagnetics Group > Department of Electrical & Computer Engineering > University of Toronto > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!ZUTwcNLfs6G5br1EAWueM5PYWNpPjiF2q3kNBXjQVBYWuiubV3uzu8ddOnkiPOhYCg2lA-vDMmxSZyP-8ZH-Bjk$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yongzhong.li at mail.utoronto.ca Tue Apr 23 14:59:54 2024 From: yongzhong.li at mail.utoronto.ca (Yongzhong Li) Date: Tue, 23 Apr 2024 19:59:54 +0000 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> Message-ID: Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don?t utilize multithreading? Only basic vector operations such as VecAXPY and VecDot or dense matrix operations in PETSc will benefit from multithreading, is it correct? Best, Yongzhong From: Barry Smith Date: Tuesday, April 23, 2024 at 3:35?PM To: Yongzhong Li Cc: petsc-users at mcs.anl.gov , petsc-maint at mcs.anl.gov , Piero Triverio Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver ????????? bsmith at petsc.dev ????????????????? Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations. On Apr 23, 2024, at 3:23?PM, Yongzhong Li wrote: Hi Barry, Thank you for the information provided! Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? Best, Yongzhong From: Barry Smith > Date: Monday, April 22, 2024 at 4:20?PM To: Yongzhong Li > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver ????????? bsmith at petsc.dev ????????????????? PETSc provided solvers do not directly use threads. The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: This Message Is From an External Sender This message came from outside your organization. Hello all, I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: #include /* Monitor function to print iteration number and residual norm */ PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { PetscErrorCode ierr; ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); CHKERRQ(ierr); return 0; } int main(int argc, char **args) { Vec x, b, x_true, e; Mat A; KSP ksp; PetscErrorCode ierr; PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n PetscInt Istart, Iend, ncols; PetscScalar v; PetscMPIInt rank; PetscInitialize(&argc, &args, NULL, NULL); PetscLogDouble t1, t2; // Variables for timing MPI_Comm_rank(PETSC_COMM_WORLD, &rank); // Create vectors and matrix ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); ierr = VecDuplicate(x, &b); CHKERRQ(ierr); ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); // Set true solution as all ones ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); // Create and assemble matrix A for the 2D Laplacian using 5-point stencil ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); ierr = MatSetFromOptions(A); CHKERRQ(ierr); ierr = MatSetUp(A); CHKERRQ(ierr); ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); for (Ii = Istart; Ii < Iend; Ii++) { i = Ii / n; // Row index j = Ii % n; // Column index v = -4.0; ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); if (i > 0) { // South J = Ii - n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (i < n - 1) { // North J = Ii + n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j > 0) { // West J = Ii - 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j < n - 1) { // East J = Ii + 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } } ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); // Compute the RHS corresponding to the true solution ierr = MatMult(A, x_true, b); CHKERRQ(ierr); // Set up and solve the linear system ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); /* Set up the monitor */ ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); // Start timing PetscTime(&t1); ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); // Stop timing PetscTime(&t2); // Compute error ierr = VecDuplicate(x, &e); CHKERRQ(ierr); ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); PetscReal norm_error, norm_true; ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); PetscReal relative_error = norm_error / norm_true; if (rank == 0) { // Print only from the first MPI process PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); } // Output the wall time taken for MatMult PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); // Cleanup ierr = VecDestroy(&x); CHKERRQ(ierr); ierr = VecDestroy(&b); CHKERRQ(ierr); ierr = VecDestroy(&x_true); CHKERRQ(ierr); ierr = VecDestroy(&e); CHKERRQ(ierr); ierr = MatDestroy(&A); CHKERRQ(ierr); ierr = KSPDestroy(&ksp); CHKERRQ(ierr); PetscFinalize(); return 0; } Here are some profiling results for GMERS solution. OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. Thank you in advance! Sincerely, Yongzhong ----------------------------------------------------------- Yongzhong Li PhD student | Electromagnetics Group Department of Electrical & Computer Engineering University of Toronto https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!Z0pxvyXKLQlC3howyi3mIQNq0FUydwnaLxNwQMyue0BB8sPuYLFqrSbUZ6qgaSY_uT13q_q86c4AlhXG1YnYBngzS5fKEIqfiO4$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Apr 23 16:13:04 2024 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 23 Apr 2024 16:13:04 -0500 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> Message-ID: No, I think sparse matrix-vector products (MatMult in petsc) can be accelerated with multithreading. petsc does not do that, but one can use other libraries, such as MKL for that. --Junchao Zhang On Tue, Apr 23, 2024 at 3:00?PM Yongzhong Li wrote: > Thanks Barry! Does this mean that the sparse matrix-vector products, which > actually constitute the majority of the computations in my GMRES routine in > PETSc, don?t utilize multithreading? Only basic vector operations such as > VecAXPY and VecDot > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Thanks Barry! Does this mean that the sparse matrix-vector products, which > actually constitute the majority of the computations in my GMRES routine in > PETSc, don?t utilize multithreading? Only basic vector operations such as > VecAXPY and VecDot or dense matrix operations in PETSc will benefit from > multithreading, is it correct? > > Best, > > Yongzhong > > > > *From: *Barry Smith > *Date: *Tuesday, April 23, 2024 at 3:35?PM > *To: *Yongzhong Li > *Cc: *petsc-users at mcs.anl.gov , > petsc-maint at mcs.anl.gov , Piero Triverio < > piero.triverio at utoronto.ca> > *Subject: *Re: [petsc-maint] Inquiry about Multithreading Capabilities in > PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > > > > Intel MKL or OpenBLAS are the best bet, but for vector operations they > will not be significant since the vector operations do not dominate the > computations. > > > > On Apr 23, 2024, at 3:23?PM, Yongzhong Li > wrote: > > > > Hi Barry, > > Thank you for the information provided! > > Do you think different BLAS implementation will affect the multithreading > performance of some vectors operations in GMERS in PETSc? > > > > I am now using OpenBLAS but didn?t see much improvement when theb > multithreading are enabled, do you think other implementation such as > netlib and intel-mkl will help? > > Best, > > Yongzhong > > > > *From: *Barry Smith > *Date: *Monday, April 22, 2024 at 4:20?PM > *To: *Yongzhong Li > *Cc: *petsc-users at mcs.anl.gov , > petsc-maint at mcs.anl.gov , Piero Triverio < > piero.triverio at utoronto.ca> > *Subject: *Re: [petsc-maint] Inquiry about Multithreading Capabilities in > PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > > > > PETSc provided solvers do not directly use threads. > > > > The BLAS used by LAPACK and PETSc may use threads depending on what > BLAS is being used and how it was configured. > > > > Some of the vector operations in GMRES in PETSc use BLAS that can use > threads, including axpy, dot, etc. For sufficiently large problems, the use > of threaded BLAS can help with these routines, but not significantly for > the solver. > > > > Dense matrix-vector products MatMult() and dense matrix direct solvers > PCLU use BLAS and thus can benefit from threading. The benefit can be > significant for large enough problems with good hardware, especially with > PCLU. > > > > If you run with -blas_view PETSc tries to indicate information about > the threading of BLAS. You can also use -blas_num_threads to set the > number of threads, equivalent to setting the environmental variable. For > dense solvers you can vary the number of threads and run with -log_view to > see what it helps to improve and what it does not effect. > > > > > > > > On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: > > > > This Message Is From an External Sender > > This message came from outside your organization. > > Hello all, > > > > I am writing to ask if PETSc?s KSPSolver makes use of > OpenMP/multithreading, specifically when performing iterative solutions > with the GMRES algorithm. > > > > The questions appeared when I was running a large numerical program based > on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve > to solve a shell matrix system iteratively. I observed that threads were > being utilized, controlled by the *OPENBLAS_NUM_THREADS* environment > variable. However, I noticed no significant performance difference between > running the solver with multiple threads versus a single thread. > > Could you please *confirm if GMRES in KSPSolve leverages multithreading, > and also whether it is influenced by the multithreadings of the low-level > math libraries such as BLAS and LAPACK?* *If so*, how can I enable > multithreading effectively to see noticeable improvements in solution times > when using GMRES? *If not*, why do I observe that threads are being used > during the GMERS solutions? > > > > *For reference, I am using PETSc version 3.16.0, configured in CMakelists > as follows:* > > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex > --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native > CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ > --download-openmpi --download-superlu --download-opencascade > --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} > --with-threadsafety --with-log=0 --with-openmp > > To simplify the diagnosis of potential issues, I have also written a small > example program using GMRES to solve a sparse matrix system derived from a > 2D Poisson problem using the finite difference method. I found similar > issues on this piece of codes. The code is as follows: > > #include > > /* Monitor function to print iteration number and residual norm */ > PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void > *ctx) { > PetscErrorCode ierr; > ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm > %g\n", n, (double)rnorm); > CHKERRQ(ierr); > return 0; > } > > int main(int argc, char **args) { > Vec x, b, x_true, e; > Mat A; > KSP ksp; > PetscErrorCode ierr; > PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n > PetscInt Istart, Iend, ncols; > PetscScalar v; > PetscMPIInt rank; > PetscInitialize(&argc, &args, NULL, NULL); > PetscLogDouble t1, t2; // Variables for timing > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > // Create vectors and matrix > ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); > CHKERRQ(ierr); > ierr = VecDuplicate(x, &b); CHKERRQ(ierr); > ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); > > // Set true solution as all ones > ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); > > // Create and assemble matrix A for the 2D Laplacian using 5-point > stencil > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); > CHKERRQ(ierr); > ierr = MatSetFromOptions(A); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); > for (Ii = Istart; Ii < Iend; Ii++) { > i = Ii / n; // Row index > j = Ii % n; // Column index > v = -4.0; > ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); > CHKERRQ(ierr); > if (i > 0) { // South > J = Ii - n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > if (i < n - 1) { // North > J = Ii + n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > if (j > 0) { // West > J = Ii - 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > if (j < n - 1) { // East > J = Ii + 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > } > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > // Compute the RHS corresponding to the true solution > ierr = MatMult(A, x_true, b); CHKERRQ(ierr); > > // Set up and solve the linear system > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); > ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, > PETSC_DEFAULT); CHKERRQ(ierr); > > /* Set up the monitor */ > ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); > > // Start timing > PetscTime(&t1); > > ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > // Stop timing > PetscTime(&t2); > > // Compute error > ierr = VecDuplicate(x, &e); CHKERRQ(ierr); > ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); > PetscReal norm_error, norm_true; > ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); > ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); > PetscReal relative_error = norm_error / norm_true; > if (rank == 0) { // Print only from the first MPI process > PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / > ||x_true||_2: %g\n", (double)relative_error); > } > > // Output the wall time taken for MatMult > PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f > seconds\n", t2 - t1); > > // Cleanup > ierr = VecDestroy(&x); CHKERRQ(ierr); > ierr = VecDestroy(&b); CHKERRQ(ierr); > ierr = VecDestroy(&x_true); CHKERRQ(ierr); > ierr = VecDestroy(&e); CHKERRQ(ierr); > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = KSPDestroy(&ksp); CHKERRQ(ierr); > PetscFinalize(); > return 0; > } > > Here are some profiling results for GMERS solution. > > OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 > OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 > > OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 > > OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 > > OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 > > > *I am using one workstation with Intel? Core? i9-11900K Processor, 8 > cores, 16 threads. Note that I am not using multiple MPI processes, such as > mpirun/mpiexec, the default number of MPI processes should be 1, correct if > I am wrong.* > > > > Thank you in advance! > > Sincerely, > > Yongzhong > > > > ----------------------------------------------------------- > > *Yongzhong Li* > > PhD student | Electromagnetics Group > > Department of Electrical & Computer Engineering > > University of Toronto > > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!apMO9i2wyrFLKJCy54sGuuNGrtwe2jlpeEh8S2BlUedaYAMmdctp-yhrBUXJ8q61TgCYpTQXtCWIJe5sDBT-gPcZWpOq$ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Apr 23 16:15:42 2024 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 23 Apr 2024 17:15:42 -0400 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> Message-ID: Yes, only the routines that can explicitly use BLAS have multi-threading. PETSc does support using nay MPI linear solvers from a sequential (or OpenMP) main program using the https://urldefense.us/v3/__https://petsc.org/release/manualpages/PC/PCMPI/*pcmpi__;Iw!!G_uCfscf7eWS!axQjxZeWC27cy3WnpqerXeWbd74F9I1B5K9M5m_81RGHmibyn9It_T8Ru5XaCj_2X3FG7XpHUh65OKUae7RSBr0$ construct. I am finishing up better support in the branch barry/2023-09-15/fix-log-pcmpi Barry > On Apr 23, 2024, at 3:59?PM, Yongzhong Li wrote: > > Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don?t utilize multithreading? Only basic vector operations such as VecAXPY and VecDot or dense matrix operations in PETSc will benefit from multithreading, is it correct? > > Best, > Yongzhong > > From: Barry Smith > > Date: Tuesday, April 23, 2024 at 3:35?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations. > > > On Apr 23, 2024, at 3:23?PM, Yongzhong Li > wrote: > > Hi Barry, > > Thank you for the information provided! > > Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? > > I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? > > Best, > Yongzhong > > From: Barry Smith > > Date: Monday, April 22, 2024 at 4:20?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > PETSc provided solvers do not directly use threads. > > The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. > > Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. > > Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. > > If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. > > > > > On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hello all, > > I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. > > The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. > > Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? > > For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp > > To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: > > #include > > /* Monitor function to print iteration number and residual norm */ > PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { > PetscErrorCode ierr; > ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); > CHKERRQ(ierr); > return 0; > } > > int main(int argc, char **args) { > Vec x, b, x_true, e; > Mat A; > KSP ksp; > PetscErrorCode ierr; > PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n > PetscInt Istart, Iend, ncols; > PetscScalar v; > PetscMPIInt rank; > PetscInitialize(&argc, &args, NULL, NULL); > PetscLogDouble t1, t2; // Variables for timing > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > // Create vectors and matrix > ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); > ierr = VecDuplicate(x, &b); CHKERRQ(ierr); > ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); > > // Set true solution as all ones > ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); > > // Create and assemble matrix A for the 2D Laplacian using 5-point stencil > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); > ierr = MatSetFromOptions(A); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); > for (Ii = Istart; Ii < Iend; Ii++) { > i = Ii / n; // Row index > j = Ii % n; // Column index > v = -4.0; > ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); > if (i > 0) { // South > J = Ii - n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (i < n - 1) { // North > J = Ii + n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j > 0) { // West > J = Ii - 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j < n - 1) { // East > J = Ii + 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > } > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > // Compute the RHS corresponding to the true solution > ierr = MatMult(A, x_true, b); CHKERRQ(ierr); > > // Set up and solve the linear system > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); > ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); > > /* Set up the monitor */ > ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); > > // Start timing > PetscTime(&t1); > > ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > // Stop timing > PetscTime(&t2); > > // Compute error > ierr = VecDuplicate(x, &e); CHKERRQ(ierr); > ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); > PetscReal norm_error, norm_true; > ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); > ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); > PetscReal relative_error = norm_error / norm_true; > if (rank == 0) { // Print only from the first MPI process > PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); > } > > // Output the wall time taken for MatMult > PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); > > // Cleanup > ierr = VecDestroy(&x); CHKERRQ(ierr); > ierr = VecDestroy(&b); CHKERRQ(ierr); > ierr = VecDestroy(&x_true); CHKERRQ(ierr); > ierr = VecDestroy(&e); CHKERRQ(ierr); > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = KSPDestroy(&ksp); CHKERRQ(ierr); > PetscFinalize(); > return 0; > } > > Here are some profiling results for GMERS solution. > > OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 > OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 > OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 > OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 > OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 > > I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. > > Thank you in advance! > > Sincerely, > Yongzhong > > ----------------------------------------------------------- > Yongzhong Li > PhD student | Electromagnetics Group > Department of Electrical & Computer Engineering > University of Toronto > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!axQjxZeWC27cy3WnpqerXeWbd74F9I1B5K9M5m_81RGHmibyn9It_T8Ru5XaCj_2X3FG7XpHUh65OKUacizprLQ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Apr 23 17:34:17 2024 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 Apr 2024 18:34:17 -0400 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> Message-ID: On Tue, Apr 23, 2024 at 4:00?PM Yongzhong Li wrote: > Thanks Barry! Does this mean that the sparse matrix-vector products, which > actually constitute the majority of the computations in my GMRES routine in > PETSc, don?t utilize multithreading? Only basic vector operations such as > VecAXPY and VecDot > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Thanks Barry! Does this mean that the sparse matrix-vector products, which > actually constitute the majority of the computations in my GMRES routine in > PETSc, don?t utilize multithreading? Only basic vector operations such as > VecAXPY and VecDot or dense matrix operations in PETSc will benefit from > multithreading, is it correct? > I am not sure what your point is above. SpMV performance is mainly controlled by memory bandwidth (latency plays very little role). If you think the MPI processes are not using the full bandwidth, use more processes. If they are, and you think using threads will speed anything up, you are incorrect. In fact, the only difference between a thread and a process is the default memory sharing flag, MPI will perform at least as well (and usually better), than adding threads to SpMV. There are dozens of publications showing this. Thanks, Matt > Best, > > Yongzhong > > > > *From: *Barry Smith > *Date: *Tuesday, April 23, 2024 at 3:35?PM > *To: *Yongzhong Li > *Cc: *petsc-users at mcs.anl.gov , > petsc-maint at mcs.anl.gov , Piero Triverio < > piero.triverio at utoronto.ca> > *Subject: *Re: [petsc-maint] Inquiry about Multithreading Capabilities in > PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > > > > Intel MKL or OpenBLAS are the best bet, but for vector operations they > will not be significant since the vector operations do not dominate the > computations. > > > > On Apr 23, 2024, at 3:23?PM, Yongzhong Li > wrote: > > > > Hi Barry, > > Thank you for the information provided! > > Do you think different BLAS implementation will affect the multithreading > performance of some vectors operations in GMERS in PETSc? > > > > I am now using OpenBLAS but didn?t see much improvement when theb > multithreading are enabled, do you think other implementation such as > netlib and intel-mkl will help? > > Best, > > Yongzhong > > > > *From: *Barry Smith > *Date: *Monday, April 22, 2024 at 4:20?PM > *To: *Yongzhong Li > *Cc: *petsc-users at mcs.anl.gov , > petsc-maint at mcs.anl.gov , Piero Triverio < > piero.triverio at utoronto.ca> > *Subject: *Re: [petsc-maint] Inquiry about Multithreading Capabilities in > PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > > > > PETSc provided solvers do not directly use threads. > > > > The BLAS used by LAPACK and PETSc may use threads depending on what > BLAS is being used and how it was configured. > > > > Some of the vector operations in GMRES in PETSc use BLAS that can use > threads, including axpy, dot, etc. For sufficiently large problems, the use > of threaded BLAS can help with these routines, but not significantly for > the solver. > > > > Dense matrix-vector products MatMult() and dense matrix direct solvers > PCLU use BLAS and thus can benefit from threading. The benefit can be > significant for large enough problems with good hardware, especially with > PCLU. > > > > If you run with -blas_view PETSc tries to indicate information about > the threading of BLAS. You can also use -blas_num_threads to set the > number of threads, equivalent to setting the environmental variable. For > dense solvers you can vary the number of threads and run with -log_view to > see what it helps to improve and what it does not effect. > > > > > > > > On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: > > > > This Message Is From an External Sender > > This message came from outside your organization. > > Hello all, > > > > I am writing to ask if PETSc?s KSPSolver makes use of > OpenMP/multithreading, specifically when performing iterative solutions > with the GMRES algorithm. > > > > The questions appeared when I was running a large numerical program based > on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve > to solve a shell matrix system iteratively. I observed that threads were > being utilized, controlled by the *OPENBLAS_NUM_THREADS* environment > variable. However, I noticed no significant performance difference between > running the solver with multiple threads versus a single thread. > > Could you please *confirm if GMRES in KSPSolve leverages multithreading, > and also whether it is influenced by the multithreadings of the low-level > math libraries such as BLAS and LAPACK?* *If so*, how can I enable > multithreading effectively to see noticeable improvements in solution times > when using GMRES? *If not*, why do I observe that threads are being used > during the GMERS solutions? > > > > *For reference, I am using PETSc version 3.16.0, configured in CMakelists > as follows:* > > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex > --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native > CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ > --download-openmpi --download-superlu --download-opencascade > --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} > --with-threadsafety --with-log=0 --with-openmp > > To simplify the diagnosis of potential issues, I have also written a small > example program using GMRES to solve a sparse matrix system derived from a > 2D Poisson problem using the finite difference method. I found similar > issues on this piece of codes. The code is as follows: > > #include > > /* Monitor function to print iteration number and residual norm */ > PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void > *ctx) { > PetscErrorCode ierr; > ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm > %g\n", n, (double)rnorm); > CHKERRQ(ierr); > return 0; > } > > int main(int argc, char **args) { > Vec x, b, x_true, e; > Mat A; > KSP ksp; > PetscErrorCode ierr; > PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n > PetscInt Istart, Iend, ncols; > PetscScalar v; > PetscMPIInt rank; > PetscInitialize(&argc, &args, NULL, NULL); > PetscLogDouble t1, t2; // Variables for timing > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > // Create vectors and matrix > ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); > CHKERRQ(ierr); > ierr = VecDuplicate(x, &b); CHKERRQ(ierr); > ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); > > // Set true solution as all ones > ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); > > // Create and assemble matrix A for the 2D Laplacian using 5-point > stencil > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); > CHKERRQ(ierr); > ierr = MatSetFromOptions(A); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); > for (Ii = Istart; Ii < Iend; Ii++) { > i = Ii / n; // Row index > j = Ii % n; // Column index > v = -4.0; > ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); > CHKERRQ(ierr); > if (i > 0) { // South > J = Ii - n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > if (i < n - 1) { // North > J = Ii + n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > if (j > 0) { // West > J = Ii - 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > if (j < n - 1) { // East > J = Ii + 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); > CHKERRQ(ierr); > } > } > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > // Compute the RHS corresponding to the true solution > ierr = MatMult(A, x_true, b); CHKERRQ(ierr); > > // Set up and solve the linear system > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); > ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, > PETSC_DEFAULT); CHKERRQ(ierr); > > /* Set up the monitor */ > ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); > > // Start timing > PetscTime(&t1); > > ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > // Stop timing > PetscTime(&t2); > > // Compute error > ierr = VecDuplicate(x, &e); CHKERRQ(ierr); > ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); > PetscReal norm_error, norm_true; > ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); > ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); > PetscReal relative_error = norm_error / norm_true; > if (rank == 0) { // Print only from the first MPI process > PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / > ||x_true||_2: %g\n", (double)relative_error); > } > > // Output the wall time taken for MatMult > PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f > seconds\n", t2 - t1); > > // Cleanup > ierr = VecDestroy(&x); CHKERRQ(ierr); > ierr = VecDestroy(&b); CHKERRQ(ierr); > ierr = VecDestroy(&x_true); CHKERRQ(ierr); > ierr = VecDestroy(&e); CHKERRQ(ierr); > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = KSPDestroy(&ksp); CHKERRQ(ierr); > PetscFinalize(); > return 0; > } > > Here are some profiling results for GMERS solution. > > OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 > OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 > > OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 > > OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 > > OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 > > > *I am using one workstation with Intel? Core? i9-11900K Processor, 8 > cores, 16 threads. Note that I am not using multiple MPI processes, such as > mpirun/mpiexec, the default number of MPI processes should be 1, correct if > I am wrong.* > > > > Thank you in advance! > > Sincerely, > > Yongzhong > > > > ----------------------------------------------------------- > > *Yongzhong Li* > > PhD student | Electromagnetics Group > > Department of Electrical & Computer Engineering > > University of Toronto > > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!d3DJ62gIDNdeGxB_NtVYXcyjdkvqiZl_TpDpRGWE0xdSPrsCxJN0xatFarNh9uELGFK33NBr1tWMxcZKN4-b$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d3DJ62gIDNdeGxB_NtVYXcyjdkvqiZl_TpDpRGWE0xdSPrsCxJN0xatFarNh9uELGFK33NBr1tWMxbFuASbi$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.stone at opengosim.com Wed Apr 24 03:56:20 2024 From: daniel.stone at opengosim.com (Daniel Stone) Date: Wed, 24 Apr 2024 09:56:20 +0100 Subject: [petsc-users] Spack build and ptscotch Message-ID: Hi PETSc community, I've been looking at using Spack to build PETSc, in particular I need to disable the default metis/parmetis dependencies and use PTScotch instead, for our software. I've had quite a bit of trouble with this - it seems like something in the resulting build of our simulator ends up badly optimised and an mpi bottleneck, when I build against PETSc built with Spack. I've been trying to track this down, and noticed this in the PETSc Spack build recipe: # PTScotch: Currently disable Parmetis wrapper, this means # nested disection won't be available thought PTScotch depends_on("scotch+esmumps~metis+mpi", when="+ptscotch") depends_on("scotch+int64", when="+ptscotch+int64") Sure enough - when I compare the build with Spack and a traditional build with ./configure etc, I see that, in the traditional build, Scotch is always built with the parmetis wrapper, but not in the Scotch build. In fact, I'm not sure how to turn off the parmetis wrapper option for scotch, in the case of a traditional build (i.e. there doesn't seem to be a flag in the configure script for it) - which would be a very useful test for me (I can of course do similar experiments by doing a classical build of petsc against ptscotch built separately without the wrappers, etc - will try that). Does anyone know why the parmetis wrapper is always disabled in the spack build options? Is there something about Spack that would prevent it from working? It's notable - but I might be missing it - that there's no warning that there's a difference in the way ptscotch is built between the spack and classical builds: - classical: ptscotch will always be built with parmetis wrappers, can't seem to turn off - spack: ptscotch will always be built without parmetis wrappers, can't turn on Any insight at all would be great, I'm new to Spack and am not super familiar with the logic that goes into setting up builds for the system. Here is the kind of command I give to Spack for PETSc builds, which may well be less than ideal: spack install petsc at 3.19.1 ~metis +ptscotch ^hdf5 +fortran +hl Separate tiny note: when building with hdf5, I have to ensure that the fortran flag is set for it, as above. There's a fortran flag for the petsc module, default true, and a fortran flag for the hdf5 module, default false. A naive user (i.e. me), will see the fortran flag for the petsc module, and assume that all dependencies will correspondingly be built with fortran capability - then see that hdf5.mod is missing when trying to build their software against petsc. It's the old "did you forget --with-hdf5-fortran-bindings?" issue, resurrected for a new build system. Thanks, Daniel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ge75rud at mytum.de Wed Apr 24 09:22:03 2024 From: ge75rud at mytum.de (Giyantha Binu Amaratunga Mukadange) Date: Wed, 24 Apr 2024 14:22:03 +0000 Subject: [petsc-users] CUDA GPU supported KSPs and PCs Message-ID: <95ebd800368b41abb3bc18545be39f71@mytum.de> Hi, Is it possible to know which KSPs and PCs currently support running on Nvidia GPUs with CUDA, or a source that has this information? The following page doesn't provide details about the supported KSPs and PCs. https://urldefense.us/v3/__https://petsc.org/main/overview/gpu_roadmap/__;!!G_uCfscf7eWS!Yi-yp6M1qurQgY1iD0qI5XhdMGGfQqSpZpHdvfnh7DxMH4BL7V-0_6HaM47ifrfJqtWBWCBbGPCdqN29E8KNZTc$ Thank you very much! Best regards, Binu -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Apr 24 10:08:05 2024 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 24 Apr 2024 11:08:05 -0400 Subject: [petsc-users] CUDA GPU supported KSPs and PCs In-Reply-To: <95ebd800368b41abb3bc18545be39f71@mytum.de> References: <95ebd800368b41abb3bc18545be39f71@mytum.de> Message-ID: <6566AAA8-7862-4F5A-88C9-26D0FFC3FD43@petsc.dev> It is less a question of what KSP and PC support running with CUDA and more a question of what parts of each KSP and PC run with CUDA (and which parts don't causing memory traffic back and forth between the CPU and GPU). Generally speaking, all the PETSc Vec operations run on CUDA. Thus "all" the KSP "support CUDA". For Mat operations, it is more complicated; triangular solves do not run well (or at all) on CUDA but much of the other operations do run on CUDA. Since setting and and solving with some PC involved rather complicated Mat operations (like PCGAMG and PCFIELDSPLIT) parts may work on CUDA and parts may not. The best way to determine how the GPU is being utilized is to run with -log_view and look at the columns that present the amount of memory traffic between the CPU and GPU and the percentage of floating point that is done on the GPU. Feel free to ask specific questions about the output. In some cases, given the output, we may be able to add additional CUDA support that is missing to decrease the memory traffic between the CPU and GPU and increase the flops done on the GPU. We cannot produce a table of what is "supported" and what is not supported or even how much is supported since there are so many combinations of possible, hence it is best to run to determine the problematic places. > On Apr 24, 2024, at 10:22?AM, Giyantha Binu Amaratunga Mukadange wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hi, > > Is it possible to know which KSPs and PCs currently support running on Nvidia GPUs with CUDA, or a source that has this information? > The following page doesn't provide details about the supported KSPs and PCs. > https://urldefense.us/v3/__https://petsc.org/main/overview/gpu_roadmap/__;!!G_uCfscf7eWS!Y4JoiNMqvhSlcSuHnANYAK0LByq3ybAxBp_7_NTmzTcV2gBNyzA0D08G7lhPAZnQZsdtwo3zvr0Dlh3jIGpKCnM$ > > Thank you very much! > > Best regards, > Binu -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Apr 24 10:25:04 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 24 Apr 2024 10:25:04 -0500 (CDT) Subject: [petsc-users] Spack build and ptscotch In-Reply-To: References: Message-ID: This is the complexity with maintaining dependencies (and dependencies of dependencies), and different build systems - Its not easy to keep the "defaults" in both builds exactly the same. - And its not easy to expose all "variants" or keep the same variants in both builds. - And each pkg has its own issues that prevents some combinations to work or not [or tested combinations vs untested]. This e-mail query has multiple things: - understand "why" the current impl of [spack, petsc] build tools are the way they are. - if they can be improved - and build use cases that you need working - [and subsequently your code working] Addressing them all is not easy - so lets stick with what you need to make progress. For one - we recommend using latest petsc version [i.e 3.21 - not 3.19] - any fixes we have will address the current release. > - spack: ptscotch will always be built without parmetis wrappers, can't turn on diff --git a/var/spack/repos/builtin/packages/petsc/package.py b/var/spack/repos/builtin/packages/petsc/package.py index b7b1d86b15..ae27ba4c4e 100644 --- a/var/spack/repos/builtin/packages/petsc/package.py +++ b/var/spack/repos/builtin/packages/petsc/package.py @@ -268,9 +268,7 @@ def check_fortran_compiler(self): depends_on("metis at 5:~int64", when="@3.8:+metis~int64") depends_on("metis at 5:+int64", when="@3.8:+metis+int64") - # PTScotch: Currently disable Parmetis wrapper, this means - # nested disection won't be available thought PTScotch - depends_on("scotch+esmumps~metis+mpi", when="+ptscotch") + depends_on("scotch+esmumps+mpi", when="+ptscotch") depends_on("scotch+int64", when="+ptscotch+int64") depends_on("hdf5@:1.10+mpi", when="@:3.12+hdf5+mpi") Now you can try: spack install petsc~metis+ptscotch ^scotch+metis vs spack install petsc~metis+ptscotch ^scotch~metis [~metis is the default for scotch] Note the following comment in spack/var/spack/repos/builtin/packages/scotch/package.py >>>> # Vendored dependency of METIS/ParMETIS conflicts with standard # installations conflicts("metis", when="+metis") conflicts("parmetis", when="+metis") <<<<< > - classical: ptscotch will always be built with parmetis wrappers, can't seem to turn off Looks like spack uses cmake build of ptscotch. PETSc uses Makefile interface. It likely doesn't support turning off metis wrappers [without hacks]. So you might either need to hack scotch build via petsc - or just install it separately - and use it with petsc. I see an effort to migrate scotch build in petsc to cmake https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7242/__;!!G_uCfscf7eWS!dL00pokNVI6oaNk_chaSyfI1zWFeTgYA9jbRW6n9YT73s51VwLBuXYc-MAJWEKXr8uBgEFrmhFQ_VJOSlvzW6OA$ https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/7495/__;!!G_uCfscf7eWS!dL00pokNVI6oaNk_chaSyfI1zWFeTgYA9jbRW6n9YT73s51VwLBuXYc-MAJWEKXr8uBgEFrmhFQ_VJOS6OuSrPs$ Satish On Wed, 24 Apr 2024, Daniel Stone wrote: > Hi PETSc community, > > I've been looking at using Spack to build PETSc, in particular I need to > disable the default metis/parmetis dependencies and use PTScotch instead, > for our software. > I've had quite a bit of trouble with this - it seems like something in the > resulting build of our simulator ends up badly optimised and an mpi > bottleneck, when I build against > PETSc built with Spack. > > I've been trying to track this down, and noticed this in the PETSc Spack > build recipe: > > # PTScotch: Currently disable Parmetis wrapper, this means > # nested disection won't be available thought PTScotch > depends_on("scotch+esmumps~metis+mpi", when="+ptscotch") > depends_on("scotch+int64", when="+ptscotch+int64") > > > Sure enough - when I compare the build with Spack and a traditional build > with ./configure etc, I see that, in the traditional build, Scotch is > always built with the parmetis wrapper, > but not in the Scotch build. In fact, I'm not sure how to turn off the > parmetis wrapper option for scotch, in the case of a traditional build > (i.e. there doesn't seem to be a flag in the > configure script for it) - which would be a very useful test for me (I can > of course do similar experiments by doing a classical build of petsc > against ptscotch built separately without the > wrappers, etc - will try that). > > Does anyone know why the parmetis wrapper is always disabled in the spack > build options? Is there something about Spack that would prevent it from > working? It's notable - but I might > be missing it - that there's no warning that there's a difference in the > way ptscotch is built between the spack and classical builds: > - classical: ptscotch will always be built with parmetis wrappers, can't > seem to turn off > - spack: ptscotch will always be built without parmetis wrappers, can't > turn on > > Any insight at all would be great, I'm new to Spack and am not super > familiar with the logic that goes into setting up builds for the system. > > Here is the kind of command I give to Spack for PETSc builds, which may > well be less than ideal: > > spack install petsc at 3.19.1 ~metis +ptscotch ^hdf5 +fortran +hl > > Separate tiny note: when building with hdf5, I have to ensure that the > fortran flag is set for it, as above. There's a fortran flag for the petsc > module, default true, and a fortran flag for the hdf5 > module, default false. A naive user (i.e. me), will see the fortran flag > for the petsc module, and assume that all dependencies will correspondingly > be built with fortran capability - then see that > hdf5.mod is missing when trying to build their software against petsc. It's > the old "did you forget --with-hdf5-fortran-bindings?" issue, resurrected > for a new build system. > > Thanks, > > Daniel > From jed at jedbrown.org Wed Apr 24 12:25:21 2024 From: jed at jedbrown.org (Jed Brown) Date: Wed, 24 Apr 2024 11:25:21 -0600 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: References: <878r1aweuv.fsf@jedbrown.org> Message-ID: <87r0eu4qri.fsf@jedbrown.org> An HTML attachment was scrubbed... URL: From ashish.patel at ansys.com Thu Apr 25 13:54:08 2024 From: ashish.patel at ansys.com (Ashish Patel) Date: Thu, 25 Apr 2024 18:54:08 +0000 Subject: [petsc-users] About recent changes in GAMG In-Reply-To: <87r0eu4qri.fsf@jedbrown.org> References: <878r1aweuv.fsf@jedbrown.org> <87r0eu4qri.fsf@jedbrown.org> Message-ID: Attaching the heaptrack files and some screenshots from one of the cores. Its most likely coming from MatStashSortCompress_Private, so if I understand correctly, it does some temporary allocations for mpi communication, but that memory gets released before it could be registered here https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/sys/objects/inherit.c*L111__;Iw!!G_uCfscf7eWS!fEVnmR9ofeaju0KXzcErICtGBrtw9qPbE9Dw1Y7vq1h4BzpT0-V2ZDldrxNysEDfCAOCKcYITz5WyYyQkUyJkkhCFYc$ [https://urldefense.us/v3/__https://gitlab.com/uploads/-/system/project/avatar/13882401/PETSc_RBG-logo.png__;!!G_uCfscf7eWS!fEVnmR9ofeaju0KXzcErICtGBrtw9qPbE9Dw1Y7vq1h4BzpT0-V2ZDldrxNysEDfCAOCKcYITz5WyYyQkUyJXeDf_RI$ ] src/sys/objects/inherit.c ? main ? PETSc / petsc ? GitLab PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. gitlab.com Ashish ________________________________ From: Jed Brown Sent: Wednesday, April 24, 2024 10:25 AM To: Ashish Patel ; Mark Adams ; PETSc users list Cc: Scott McClennan ; Jeremy Theler (External) Subject: Re: [petsc-users] About recent changes in GAMG [External Sender] Ashish Patel writes: > Hi Jed, > VmRss is on a higher side and seems to match what PetscMallocGetMaximumUsage is reporting. HugetlbPages was 0 for me. > > Mark, running without the near nullspace also gives similar results. I have attached the malloc_view and gamg info for serial and 2 core runs. Some of the standout functions on rank 0 for parallel run seems to be > 5.3 GB MatSeqAIJSetPreallocation_SeqAIJ > 7.7 GB MatStashSortCompress_Private > 10.1 GB PetscMatStashSpaceGet > 7.7 GB PetscSegBufferAlloc_Private > > malloc_view also says the following > [0] Maximum memory PetscMalloc()ed 32387548912 maximum size of entire process 8270635008 > which fits the PetscMallocGetMaximumUsage > PetscMemoryGetMaximumUsage output. This would occur if there was a large PetscMalloc'd block that did not get used (so only a portion of it is faulted and thus becomes resident). Can you run a heap profiler like heaptrack? https://urldefense.us/v3/__https://github.com/KDE/heaptrack__;!!G_uCfscf7eWS!fEVnmR9ofeaju0KXzcErICtGBrtw9qPbE9Dw1Y7vq1h4BzpT0-V2ZDldrxNysEDfCAOCKcYITz5WyYyQkUyJ9RdSDEA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: consumption.png Type: image/png Size: 171407 bytes Desc: consumption.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: topdown.png Type: image/png Size: 549151 bytes Desc: topdown.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: flamegraph.png Type: image/png Size: 283147 bytes Desc: flamegraph.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex1_outputlog_3.21.0.1115_debug.log Type: text/x-log Size: 29505 bytes Desc: ex1_outputlog_3.21.0.1115_debug.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: heaptrack.ex1.1128301.zst Type: application/zstd Size: 855213 bytes Desc: heaptrack.ex1.1128301.zst URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: heaptrack.ex1.1128300.zst Type: application/zstd Size: 980714 bytes Desc: heaptrack.ex1.1128300.zst URL: From karthikeyan.chockalingam at stfc.ac.uk Fri Apr 26 06:22:41 2024 From: karthikeyan.chockalingam at stfc.ac.uk (Karthikeyan Chockalingam - STFC UKRI) Date: Fri, 26 Apr 2024 11:22:41 +0000 Subject: [petsc-users] PETSc-GPU Message-ID: Hello, When PETSc is installed with GPU support, will it run only on GPUs or can it be run on CPUs (without GPUs)? Currently, PETSc crashes when run on CPUs. Thank you. Best regards, Karthik. -- Karthik Chockalingam, Ph.D. Senior Research Software Engineer High Performance Systems Engineering Group Hartree Centre | Science and Technology Facilities Council karthikeyan.chockalingam at stfc.ac.uk [signature_3970890138] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 18270 bytes Desc: image001.png URL: From knepley at gmail.com Fri Apr 26 06:32:36 2024 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 26 Apr 2024 07:32:36 -0400 Subject: [petsc-users] PETSc-GPU In-Reply-To: References: Message-ID: On Fri, Apr 26, 2024 at 7:23?AM Karthikeyan Chockalingam - STFC UKRI via petsc-users wrote: > Hello, When PETSc is installed with GPU support, will it run only on GPUs > or can it be run on CPUs (without GPUs)? Currently, PETSc crashes when run > on CPUs. Thank you. Best regards, Karthik. -- Karthik Chockalingam, Ph. D. > Senior Research Software > ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > > ZjQcmQRYFpfptBannerEnd > > Hello, > > > > When PETSc is installed with GPU support, will it run only on GPUs or can > it be run on CPUs (without GPUs)? Currently, PETSc crashes when run on CPUs. > It should run on both. Can you send the crash? I think we did fix a problem with it eagerly initializing GPUs when they were absent. Thanks, Matt > Thank you. > > > > Best regards, > > Karthik. > > > > -- > > *Karthik Chockalingam, Ph.D.* > > Senior Research Software Engineer > > High Performance Systems Engineering Group > > Hartree Centre | Science and Technology Facilities Council > > karthikeyan.chockalingam at stfc.ac.uk > > > > [image: signature_3970890138] > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a0VV8buEHYPmWgGr_wyPE35z-vFWz2YwIgw3_bJI-_ydHr3nvKWF8LjXSvvoCPZ0U1zxOqXdxcrOKx8-ahM8$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 18270 bytes Desc: not available URL: From yongzhong.li at mail.utoronto.ca Fri Apr 26 16:11:03 2024 From: yongzhong.li at mail.utoronto.ca (Yongzhong Li) Date: Fri, 26 Apr 2024 21:11:03 +0000 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> Message-ID: Hi Barry, Thanks, I am interested in this PCMPI solution provided by PETSc! I tried the src/ksp/ksp/tutorials/ex1.c which is configured in CMakelists as follows: ./configure PETSC_ARCH=config-debug --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 --with-logging=0 --with-cxx=g++ --download-mpich --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} In the linux terminal, my bash script is as follows, mpiexec -n 4 ./ex1 -mpi_linear_solver_server -mpi_linear_solver_server_view However, I found the ouput a bit strange Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 It seems that mpi started four processes, but they all did the same things, and I am confused why the ranks showed sequential. Are these supposed to be the desired output when the mpi_linear_solver_server is turned on? And if I run ex1 without any hypen options, I got Norm of error 2.47258e-15, Iterations 5 It looks like the KSPSolver use 5 iterations to reach convergence, but why when mpi_linear_solver_server is enabled, it uses 1? I hope to get some help on these issues, thank you! Sincerely, Yongzhong From: Barry Smith Date: Tuesday, April 23, 2024 at 5:15?PM To: Yongzhong Li Cc: petsc-users at mcs.anl.gov , petsc-maint at mcs.anl.gov , Piero Triverio Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver Yes, only the routines that can explicitly use BLAS have multi-threading. PETSc does support using nay MPI linear solvers from a sequential (or OpenMP) main program using the https://urldefense.us/v3/__https://petsc.org/release/manualpages/PC/PCMPI/*pcmpi__;Iw!!G_uCfscf7eWS!coHE0Q8qDqtiQ2cOB3iEz4D8dXOcJd56gBkPuaz5Y7pMygwOLBhFHyXhSi3WWUkAulg3kXxAYomf9fk-dT-fzj0mGFIgnkEZ54I$ construct. I am finishing up better support in the branch barry/2023-09-15/fix-log-pcmpi Barry On Apr 23, 2024, at 3:59?PM, Yongzhong Li wrote: Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don?t utilize multithreading? Only basic vector operations such as VecAXPY and VecDot or dense matrix operations in PETSc will benefit from multithreading, is it correct? Best, Yongzhong From: Barry Smith > Date: Tuesday, April 23, 2024 at 3:35?PM To: Yongzhong Li > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver ????????? bsmith at petsc.dev ????????????????? Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations. On Apr 23, 2024, at 3:23?PM, Yongzhong Li > wrote: Hi Barry, Thank you for the information provided! Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? Best, Yongzhong From: Barry Smith > Date: Monday, April 22, 2024 at 4:20?PM To: Yongzhong Li > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver ????????? bsmith at petsc.dev ????????????????? PETSc provided solvers do not directly use threads. The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: This Message Is From an External Sender This message came from outside your organization. Hello all, I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: #include /* Monitor function to print iteration number and residual norm */ PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { PetscErrorCode ierr; ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); CHKERRQ(ierr); return 0; } int main(int argc, char **args) { Vec x, b, x_true, e; Mat A; KSP ksp; PetscErrorCode ierr; PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n PetscInt Istart, Iend, ncols; PetscScalar v; PetscMPIInt rank; PetscInitialize(&argc, &args, NULL, NULL); PetscLogDouble t1, t2; // Variables for timing MPI_Comm_rank(PETSC_COMM_WORLD, &rank); // Create vectors and matrix ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); ierr = VecDuplicate(x, &b); CHKERRQ(ierr); ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); // Set true solution as all ones ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); // Create and assemble matrix A for the 2D Laplacian using 5-point stencil ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); ierr = MatSetFromOptions(A); CHKERRQ(ierr); ierr = MatSetUp(A); CHKERRQ(ierr); ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); for (Ii = Istart; Ii < Iend; Ii++) { i = Ii / n; // Row index j = Ii % n; // Column index v = -4.0; ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); if (i > 0) { // South J = Ii - n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (i < n - 1) { // North J = Ii + n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j > 0) { // West J = Ii - 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j < n - 1) { // East J = Ii + 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } } ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); // Compute the RHS corresponding to the true solution ierr = MatMult(A, x_true, b); CHKERRQ(ierr); // Set up and solve the linear system ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); /* Set up the monitor */ ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); // Start timing PetscTime(&t1); ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); // Stop timing PetscTime(&t2); // Compute error ierr = VecDuplicate(x, &e); CHKERRQ(ierr); ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); PetscReal norm_error, norm_true; ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); PetscReal relative_error = norm_error / norm_true; if (rank == 0) { // Print only from the first MPI process PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); } // Output the wall time taken for MatMult PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); // Cleanup ierr = VecDestroy(&x); CHKERRQ(ierr); ierr = VecDestroy(&b); CHKERRQ(ierr); ierr = VecDestroy(&x_true); CHKERRQ(ierr); ierr = VecDestroy(&e); CHKERRQ(ierr); ierr = MatDestroy(&A); CHKERRQ(ierr); ierr = KSPDestroy(&ksp); CHKERRQ(ierr); PetscFinalize(); return 0; } Here are some profiling results for GMERS solution. OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. Thank you in advance! Sincerely, Yongzhong ----------------------------------------------------------- Yongzhong Li PhD student | Electromagnetics Group Department of Electrical & Computer Engineering University of Toronto https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!coHE0Q8qDqtiQ2cOB3iEz4D8dXOcJd56gBkPuaz5Y7pMygwOLBhFHyXhSi3WWUkAulg3kXxAYomf9fk-dT-fzj0mGFIgALuNCjQ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Apr 27 11:54:36 2024 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 27 Apr 2024 12:54:36 -0400 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> Message-ID: <3636D560-44B1-4C7C-8FA6-8308D6450099@petsc.dev> You should use the git branch barry/2023-09-15/fix-log-pcmpi It is still work-in-progress but much better than what is currently in the main PETSc branch. By default, the MPI linear solver server requires 10,000 unknowns per MPI process, so for smaller problems, it will only run on one MPI rank and list Sequential in your output. In general you need on the order of at least 10,000 unknowns per MPI process to get good speedup. You can control it with -mpi_linear_solver_server_minimum_count_per_rank Regarding the report of 1 iteration, that is fixed in the branch listed above. Barry > On Apr 26, 2024, at 5:11?PM, Yongzhong Li wrote: > > Hi Barry, > > Thanks, I am interested in this PCMPI solution provided by PETSc! > > I tried the src/ksp/ksp/tutorials/ex1.c which is configured in CMakelists as follows: > > ./configure PETSC_ARCH=config-debug --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 --with-logging=0 --with-cxx=g++ --download-mpich --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} > > In the linux terminal, my bash script is as follows, > > mpiexec -n 4 ./ex1 -mpi_linear_solver_server -mpi_linear_solver_server_view > > However, I found the ouput a bit strange > > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > > It seems that mpi started four processes, but they all did the same things, and I am confused why the ranks showed sequential. Are these supposed to be the desired output when the mpi_linear_solver_server is turned on? > > And if I run ex1 without any hypen options, I got > > Norm of error 2.47258e-15, Iterations 5 > > It looks like the KSPSolver use 5 iterations to reach convergence, but why when mpi_linear_solver_server is enabled, it uses 1? > > I hope to get some help on these issues, thank you! > > Sincerely, > Yongzhong > > > > > From: Barry Smith > > Date: Tuesday, April 23, 2024 at 5:15?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > > Yes, only the routines that can explicitly use BLAS have multi-threading. > > PETSc does support using nay MPI linear solvers from a sequential (or OpenMP) main program using the https://urldefense.us/v3/__https://petsc.org/release/manualpages/PC/PCMPI/*pcmpi__;Iw!!G_uCfscf7eWS!ZuPZtoeGFKUjdTAW0Ylzhjz0KaqtPKAf4ZOa1Xahj_4JUS8wwupZKDb_BQCWgFWPJIYRFlA3dTDHsu8HNnxbn4Q$ construct. I am finishing up better support in the branch barry/2023-09-15/fix-log-pcmpi > > Barry > > > > > > > > On Apr 23, 2024, at 3:59?PM, Yongzhong Li > wrote: > > Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don?t utilize multithreading? Only basic vector operations such as VecAXPY and VecDot or dense matrix operations in PETSc will benefit from multithreading, is it correct? > > Best, > Yongzhong > > From: Barry Smith > > Date: Tuesday, April 23, 2024 at 3:35?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations. > > > On Apr 23, 2024, at 3:23?PM, Yongzhong Li > wrote: > > Hi Barry, > > Thank you for the information provided! > > Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? > > I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? > > Best, > Yongzhong > > From: Barry Smith > > Date: Monday, April 22, 2024 at 4:20?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > PETSc provided solvers do not directly use threads. > > The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. > > Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. > > Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. > > If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. > > > > > On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hello all, > > I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. > > The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. > > Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? > > For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp > > To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: > > #include > > /* Monitor function to print iteration number and residual norm */ > PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { > PetscErrorCode ierr; > ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); > CHKERRQ(ierr); > return 0; > } > > int main(int argc, char **args) { > Vec x, b, x_true, e; > Mat A; > KSP ksp; > PetscErrorCode ierr; > PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n > PetscInt Istart, Iend, ncols; > PetscScalar v; > PetscMPIInt rank; > PetscInitialize(&argc, &args, NULL, NULL); > PetscLogDouble t1, t2; // Variables for timing > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > // Create vectors and matrix > ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); > ierr = VecDuplicate(x, &b); CHKERRQ(ierr); > ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); > > // Set true solution as all ones > ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); > > // Create and assemble matrix A for the 2D Laplacian using 5-point stencil > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); > ierr = MatSetFromOptions(A); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); > for (Ii = Istart; Ii < Iend; Ii++) { > i = Ii / n; // Row index > j = Ii % n; // Column index > v = -4.0; > ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); > if (i > 0) { // South > J = Ii - n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (i < n - 1) { // North > J = Ii + n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j > 0) { // West > J = Ii - 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j < n - 1) { // East > J = Ii + 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > } > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > // Compute the RHS corresponding to the true solution > ierr = MatMult(A, x_true, b); CHKERRQ(ierr); > > // Set up and solve the linear system > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); > ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); > > /* Set up the monitor */ > ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); > > // Start timing > PetscTime(&t1); > > ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > // Stop timing > PetscTime(&t2); > > // Compute error > ierr = VecDuplicate(x, &e); CHKERRQ(ierr); > ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); > PetscReal norm_error, norm_true; > ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); > ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); > PetscReal relative_error = norm_error / norm_true; > if (rank == 0) { // Print only from the first MPI process > PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); > } > > // Output the wall time taken for MatMult > PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); > > // Cleanup > ierr = VecDestroy(&x); CHKERRQ(ierr); > ierr = VecDestroy(&b); CHKERRQ(ierr); > ierr = VecDestroy(&x_true); CHKERRQ(ierr); > ierr = VecDestroy(&e); CHKERRQ(ierr); > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = KSPDestroy(&ksp); CHKERRQ(ierr); > PetscFinalize(); > return 0; > } > > Here are some profiling results for GMERS solution. > > OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 > OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 > OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 > OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 > OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 > > I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. > > Thank you in advance! > > Sincerely, > Yongzhong > > ----------------------------------------------------------- > Yongzhong Li > PhD student | Electromagnetics Group > Department of Electrical & Computer Engineering > University of Toronto > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!ZuPZtoeGFKUjdTAW0Ylzhjz0KaqtPKAf4ZOa1Xahj_4JUS8wwupZKDb_BQCWgFWPJIYRFlA3dTDHsu8H_oB-EXM$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Apr 29 10:24:10 2024 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 29 Apr 2024 15:24:10 +0000 Subject: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time Message-ID: Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). I tried this with gcc compilers and openmpi: $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force and get for SuiteSparse: metis: Version: 5.1.0 Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis SuiteSparse: Version: 7.7.0 Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. Thank you for your time! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Apr 29 11:00:24 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 29 Apr 2024 11:00:24 -0500 (CDT) Subject: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time In-Reply-To: References: Message-ID: # Other CMakeLists.txt files inside SuiteSparse are from dependent packages # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis # which is a slightly revised copy of METIS 5.0.1) but none of those # CMakeLists.txt files are used to build any package in SuiteSparse. So suitesparse includes a copy of metis sources - i.e does not use external metis library? >> balay at pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway libcholmod.so:000000000026e500 T SuiteSparse_metis_METIS_PartGraphKway <<< And metis routines are already in -lcholmod [with some namespace fixes] Satish On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). > I tried this with gcc compilers and openmpi: > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force > > and get for SuiteSparse: > > metis: > Version: 5.1.0 > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > SuiteSparse: > Version: 7.7.0 > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > > for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. > Thank you for your time! > Marcos > From marcos.vanella at nist.gov Mon Apr 29 11:05:59 2024 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 29 Apr 2024 16:05:59 +0000 Subject: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time In-Reply-To: References: Message-ID: Hi Satish, Ok thank you for clarifying. I don't need to include Metis in the config phase then (not using anywhere else). Is there a way I can configure PETSc to not require X11 (Xgraph functions, etc.)? Thank you, Marcos ________________________________ From: Satish Balay Sent: Monday, April 29, 2024 12:00 PM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time # Other CMakeLists.txt files inside SuiteSparse are from dependent packages # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis # which is a slightly revised copy of METIS 5.0.1) but none of those # CMakeLists.txt files are used to build any package in SuiteSparse. So suitesparse includes a copy of metis sources - i.e does not use external metis library? >> balay at pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway libcholmod.so:000000000026e500 T SuiteSparse_metis_METIS_PartGraphKway <<< And metis routines are already in -lcholmod [with some namespace fixes] Satish On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). > I tried this with gcc compilers and openmpi: > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force > > and get for SuiteSparse: > > metis: > Version: 5.1.0 > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > SuiteSparse: > Version: 7.7.0 > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > > for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. > Thank you for your time! > Marcos > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Apr 29 11:11:03 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 29 Apr 2024 11:11:03 -0500 (CDT) Subject: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time In-Reply-To: References: Message-ID: <0cfcdb1a-2818-4f33-5e9d-a7a401aaf346@mcs.anl.gov> BTW: you can avoid LDFLAGS="-ld_classic" [and also --with-shared-libraries=0?] with --download-openmpi=https://urldefense.us/v3/__https://web.cels.anl.gov/projects/petsc/download/externalpackages/openmpi-5.0.3-xcode15.tar.gz__;!!G_uCfscf7eWS!aKSkj7j0MQdzrUxQIWrganVR7b-fp6-0WA2AHagxB5l4dfWhCIXRkGDh_lPJgJ0av1HSQ_YMreLtmLeeBQ5VB9M$ Satish On Mon, 29 Apr 2024, Satish Balay via petsc-users wrote: > > # Other CMakeLists.txt files inside SuiteSparse are from dependent packages > # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis > # which is a slightly revised copy of METIS 5.0.1) but none of those > # CMakeLists.txt files are used to build any package in SuiteSparse. > > > So suitesparse includes a copy of metis sources - i.e does not use external metis library? > > >> > balay at pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway > libcholmod.so:000000000026e500 T SuiteSparse_metis_METIS_PartGraphKway > <<< > > And metis routines are already in -lcholmod [with some namespace fixes] > > Satish > > On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > > > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). > > I tried this with gcc compilers and openmpi: > > > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force > > > > and get for SuiteSparse: > > > > metis: > > Version: 5.1.0 > > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > > SuiteSparse: > > Version: 7.7.0 > > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > > > > for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. > > Thank you for your time! > > Marcos > > > > From balay at mcs.anl.gov Mon Apr 29 11:17:02 2024 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 29 Apr 2024 11:17:02 -0500 (CDT) Subject: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024, Vanella, Marcos (Fed) wrote: > Hi Satish, > Ok thank you for clarifying. I don't need to include Metis in the config phase then (not using anywhere else). > Is there a way I can configure PETSc to not require X11 (Xgraph functions, etc.)? if x is not installed - configure won't find it - and defaults to disabling x11. Otherwise you can always force: --with-x=0 Satish > Thank you, > Marcos > ________________________________ > From: Satish Balay > Sent: Monday, April 29, 2024 12:00 PM > To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time > > > # Other CMakeLists.txt files inside SuiteSparse are from dependent packages > # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis > # which is a slightly revised copy of METIS 5.0.1) but none of those > # CMakeLists.txt files are used to build any package in SuiteSparse. > > > So suitesparse includes a copy of metis sources - i.e does not use external metis library? > > >> > balay at pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway > libcholmod.so:000000000026e500 T SuiteSparse_metis_METIS_PartGraphKway > <<< > > And metis routines are already in -lcholmod [with some namespace fixes] > > Satish > > On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > > > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). > > I tried this with gcc compilers and openmpi: > > > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force > > > > and get for SuiteSparse: > > > > metis: > > Version: 5.1.0 > > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > > SuiteSparse: > > Version: 7.7.0 > > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > > > > for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. > > Thank you for your time! > > Marcos > > > > From bsmith at petsc.dev Mon Apr 29 11:15:53 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 29 Apr 2024 12:15:53 -0400 Subject: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time In-Reply-To: References: Message-ID: <501E4C3F-0F7E-46FC-ABF3-F45642A94E64@petsc.dev> --with-x=0 > On Apr 29, 2024, at 12:05?PM, Vanella, Marcos (Fed) via petsc-users wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hi Satish, > Ok thank you for clarifying. I don't need to include Metis in the config phase then (not using anywhere else). > Is there a way I can configure PETSc to not require X11 (Xgraph functions, etc.)? > Thank you, > Marcos > From: Satish Balay > > Sent: Monday, April 29, 2024 12:00 PM > To: Vanella, Marcos (Fed) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time > > > # Other CMakeLists.txt files inside SuiteSparse are from dependent packages > # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis > # which is a slightly revised copy of METIS 5.0.1) but none of those > # CMakeLists.txt files are used to build any package in SuiteSparse. > > > So suitesparse includes a copy of metis sources - i.e does not use external metis library? > > >> > balay at pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway > libcholmod.so :000000000026e500 T SuiteSparse_metis_METIS_PartGraphKway > <<< > > And metis routines are already in -lcholmod [with some namespace fixes] > > Satish > > On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > > > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). > > I tried this with gcc compilers and openmpi: > > > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force > > > > and get for SuiteSparse: > > > > metis: > > Version: 5.1.0 > > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > > SuiteSparse: > > Version: 7.7.0 > > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > > > > for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. > > Thank you for your time! > > Marcos > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Apr 29 12:06:17 2024 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 29 Apr 2024 17:06:17 +0000 Subject: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time In-Reply-To: <501E4C3F-0F7E-46FC-ABF3-F45642A94E64@petsc.dev> References: <501E4C3F-0F7E-46FC-ABF3-F45642A94E64@petsc.dev> Message-ID: Thank you Barry and Satish. Trying it now. ________________________________ From: Barry Smith Sent: Monday, April 29, 2024 12:15 PM To: Vanella, Marcos (Fed) Cc: balay at mcs.anl.gov ; petsc-users Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time --with-x=0 On Apr 29, 2024, at 12:05?PM, Vanella, Marcos (Fed) via petsc-users wrote: This Message Is From an External Sender This message came from outside your organization. Hi Satish, Ok thank you for clarifying. I don't need to include Metis in the config phase then (not using anywhere else). Is there a way I can configure PETSc to not require X11 (Xgraph functions, etc.)? Thank you, Marcos ________________________________ From: Satish Balay > Sent: Monday, April 29, 2024 12:00 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time # Other CMakeLists.txt files inside SuiteSparse are from dependent packages # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis # which is a slightly revised copy of METIS 5.0.1) but none of those # CMakeLists.txt files are used to build any package in SuiteSparse. So suitesparse includes a copy of metis sources - i.e does not use external metis library? >> balay at pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway libcholmod.so:000000000026e500 T SuiteSparse_metis_METIS_PartGraphKway <<< And metis routines are already in -lcholmod [with some namespace fixes] Satish On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). > I tried this with gcc compilers and openmpi: > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force > > and get for SuiteSparse: > > metis: > Version: 5.1.0 > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > SuiteSparse: > Version: 7.7.0 > Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > > for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. > Thank you for your time! > Marcos > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yongzhong.li at mail.utoronto.ca Mon Apr 29 14:45:49 2024 From: yongzhong.li at mail.utoronto.ca (Yongzhong Li) Date: Mon, 29 Apr 2024 19:45:49 +0000 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: <3636D560-44B1-4C7C-8FA6-8308D6450099@petsc.dev> References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> <3636D560-44B1-4C7C-8FA6-8308D6450099@petsc.dev> Message-ID: Hi Barry, Thanks for your reply, I checkout to the git branch barry/2023-09-15/fix-log-pcmpi but get some errors when configuring PETSc, below is the error message, ============================================================================================= Configuring PETSc to compile on your system ============================================================================================= ============================================================================================= ***** WARNING ***** Found environment variable: MAKEFLAGS=s -j14 --jobserver-auth=3,5. Ignoring it! Use "./configure MAKEFLAGS=$MAKEFLAGS" if you really want to use this value ============================================================================================= ============================================================================================= Running configure on SOWING; this may take several minutes ============================================================================================= ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Error running configure on SOWING ********************************************************************************************* My configuration is ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp I didn?t have this issue when I configured PETSc using tarball release download version. Any suggestions on this? Thanks and regards, Yongzhong From: Barry Smith Date: Saturday, April 27, 2024 at 12:54?PM To: Yongzhong Li Cc: petsc-users at mcs.anl.gov , petsc-maint at mcs.anl.gov Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver You should use the git branch barry/2023-09-15/fix-log-pcmpi It is still work-in-progress but much better than what is currently in the main PETSc branch. By default, the MPI linear solver server requires 10,000 unknowns per MPI process, so for smaller problems, it will only run on one MPI rank and list Sequential in your output. In general you need on the order of at least 10,000 unknowns per MPI process to get good speedup. You can control it with -mpi_linear_solver_server_minimum_count_per_rank Regarding the report of 1 iteration, that is fixed in the branch listed above. Barry On Apr 26, 2024, at 5:11?PM, Yongzhong Li wrote: Hi Barry, Thanks, I am interested in this PCMPI solution provided by PETSc! I tried the src/ksp/ksp/tutorials/ex1.c which is configured in CMakelists as follows: ./configure PETSC_ARCH=config-debug --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 --with-logging=0 --with-cxx=g++ --download-mpich --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} In the linux terminal, my bash script is as follows, mpiexec -n 4 ./ex1 -mpi_linear_solver_server -mpi_linear_solver_server_view However, I found the ouput a bit strange Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 Norm of error 1.23629e-15, Iterations 1 MPI linear solver server statistics: Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its Sequential 3 2 10 1 It seems that mpi started four processes, but they all did the same things, and I am confused why the ranks showed sequential. Are these supposed to be the desired output when the mpi_linear_solver_server is turned on? And if I run ex1 without any hypen options, I got Norm of error 2.47258e-15, Iterations 5 It looks like the KSPSolver use 5 iterations to reach convergence, but why when mpi_linear_solver_server is enabled, it uses 1? I hope to get some help on these issues, thank you! Sincerely, Yongzhong From: Barry Smith > Date: Tuesday, April 23, 2024 at 5:15?PM To: Yongzhong Li > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver Yes, only the routines that can explicitly use BLAS have multi-threading. PETSc does support using nay MPI linear solvers from a sequential (or OpenMP) main program using the https://urldefense.us/v3/__https://petsc.org/release/manualpages/PC/PCMPI/*pcmpi__;Iw!!G_uCfscf7eWS!eirB1uKnj4uFPCED9aOb4X-qap5UjyksRvLvCOc0fmBBwdAgkeUKqDhF8C3Eq10bLaeGN5DDRvUKFmIh7NSFmVqNOdnLr9b-E48$ construct. I am finishing up better support in the branch barry/2023-09-15/fix-log-pcmpi Barry On Apr 23, 2024, at 3:59?PM, Yongzhong Li > wrote: Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don?t utilize multithreading? Only basic vector operations such as VecAXPY and VecDot or dense matrix operations in PETSc will benefit from multithreading, is it correct? Best, Yongzhong From: Barry Smith > Date: Tuesday, April 23, 2024 at 3:35?PM To: Yongzhong Li > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver ????????? bsmith at petsc.dev ????????????????? Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations. On Apr 23, 2024, at 3:23?PM, Yongzhong Li > wrote: Hi Barry, Thank you for the information provided! Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? Best, Yongzhong From: Barry Smith > Date: Monday, April 22, 2024 at 4:20?PM To: Yongzhong Li > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver ????????? bsmith at petsc.dev ????????????????? PETSc provided solvers do not directly use threads. The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: This Message Is From an External Sender This message came from outside your organization. Hello all, I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: #include /* Monitor function to print iteration number and residual norm */ PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { PetscErrorCode ierr; ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); CHKERRQ(ierr); return 0; } int main(int argc, char **args) { Vec x, b, x_true, e; Mat A; KSP ksp; PetscErrorCode ierr; PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n PetscInt Istart, Iend, ncols; PetscScalar v; PetscMPIInt rank; PetscInitialize(&argc, &args, NULL, NULL); PetscLogDouble t1, t2; // Variables for timing MPI_Comm_rank(PETSC_COMM_WORLD, &rank); // Create vectors and matrix ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); ierr = VecDuplicate(x, &b); CHKERRQ(ierr); ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); // Set true solution as all ones ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); // Create and assemble matrix A for the 2D Laplacian using 5-point stencil ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); ierr = MatSetFromOptions(A); CHKERRQ(ierr); ierr = MatSetUp(A); CHKERRQ(ierr); ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); for (Ii = Istart; Ii < Iend; Ii++) { i = Ii / n; // Row index j = Ii % n; // Column index v = -4.0; ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); if (i > 0) { // South J = Ii - n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (i < n - 1) { // North J = Ii + n; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j > 0) { // West J = Ii - 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } if (j < n - 1) { // East J = Ii + 1; v = 1.0; ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); } } ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); // Compute the RHS corresponding to the true solution ierr = MatMult(A, x_true, b); CHKERRQ(ierr); // Set up and solve the linear system ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); /* Set up the monitor */ ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); // Start timing PetscTime(&t1); ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); // Stop timing PetscTime(&t2); // Compute error ierr = VecDuplicate(x, &e); CHKERRQ(ierr); ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); PetscReal norm_error, norm_true; ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); PetscReal relative_error = norm_error / norm_true; if (rank == 0) { // Print only from the first MPI process PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); } // Output the wall time taken for MatMult PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); // Cleanup ierr = VecDestroy(&x); CHKERRQ(ierr); ierr = VecDestroy(&b); CHKERRQ(ierr); ierr = VecDestroy(&x_true); CHKERRQ(ierr); ierr = VecDestroy(&e); CHKERRQ(ierr); ierr = MatDestroy(&A); CHKERRQ(ierr); ierr = KSPDestroy(&ksp); CHKERRQ(ierr); PetscFinalize(); return 0; } Here are some profiling results for GMERS solution. OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. Thank you in advance! Sincerely, Yongzhong ----------------------------------------------------------- Yongzhong Li PhD student | Electromagnetics Group Department of Electrical & Computer Engineering University of Toronto https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!eirB1uKnj4uFPCED9aOb4X-qap5UjyksRvLvCOc0fmBBwdAgkeUKqDhF8C3Eq10bLaeGN5DDRvUKFmIh7NSFmVqNOdnLTC2sULA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Apr 29 14:55:12 2024 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 29 Apr 2024 15:55:12 -0400 Subject: [petsc-users] [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver In-Reply-To: References: <52AD060A-EB04-463C-BFDA-C82FAC681891@petsc.dev> <7834FEBC-06F3-4D49-8F32-00A4D699BF11@petsc.dev> <3636D560-44B1-4C7C-8FA6-8308D6450099@petsc.dev> Message-ID: Do you need fortran? If not just run again with also --with-fc=0 --with-sowing=0 If you need fortran send configure.log > On Apr 29, 2024, at 3:45?PM, Yongzhong Li wrote: > > Hi Barry, > > Thanks for your reply, I checkout to the git branch barry/2023-09-15/fix-log-pcmpi but get some errors when configuring PETSc, below is the error message, > > ============================================================================================= > Configuring PETSc to compile on your system > ============================================================================================= > ============================================================================================= > ***** WARNING ***** > Found environment variable: MAKEFLAGS=s -j14 --jobserver-auth=3,5. Ignoring it! Use > "./configure MAKEFLAGS=$MAKEFLAGS" if you really want to use this value > ============================================================================================= > ============================================================================================= > Running configure on SOWING; this may take several minutes > ============================================================================================= > > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Error running configure on SOWING > ********************************************************************************************* > My configuration is > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp > > I didn?t have this issue when I configured PETSc using tarball release download version. Any suggestions on this? > > Thanks and regards, > Yongzhong > > From: Barry Smith > > Date: Saturday, April 27, 2024 at 12:54?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > > You should use the git branch barry/2023-09-15/fix-log-pcmpi It is still work-in-progress but much better than what is currently in the main PETSc branch. > > By default, the MPI linear solver server requires 10,000 unknowns per MPI process, so for smaller problems, it will only run on one MPI rank and list Sequential in your output. In general you need on the order of at least 10,000 unknowns per MPI process to get good speedup. You can control it with > > -mpi_linear_solver_server_minimum_count_per_rank > > Regarding the report of 1 iteration, that is fixed in the branch listed above. > > Barry > > On Apr 26, 2024, at 5:11?PM, Yongzhong Li > wrote: > > Hi Barry, > > Thanks, I am interested in this PCMPI solution provided by PETSc! > > I tried the src/ksp/ksp/tutorials/ex1.c which is configured in CMakelists as follows: > > ./configure PETSC_ARCH=config-debug --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 --with-logging=0 --with-cxx=g++ --download-mpich --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} > > In the linux terminal, my bash script is as follows, > > mpiexec -n 4 ./ex1 -mpi_linear_solver_server -mpi_linear_solver_server_view > > However, I found the ouput a bit strange > > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > Norm of error 1.23629e-15, Iterations 1 > MPI linear solver server statistics: > Ranks KSPSolve()s Mats KSPs Avg. Size Avg. Its > Sequential 3 2 10 1 > > It seems that mpi started four processes, but they all did the same things, and I am confused why the ranks showed sequential. Are these supposed to be the desired output when the mpi_linear_solver_server is turned on? > > And if I run ex1 without any hypen options, I got > > Norm of error 2.47258e-15, Iterations 5 > > It looks like the KSPSolver use 5 iterations to reach convergence, but why when mpi_linear_solver_server is enabled, it uses 1? > > I hope to get some help on these issues, thank you! > > Sincerely, > Yongzhong > > > > > From: Barry Smith > > Date: Tuesday, April 23, 2024 at 5:15?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > > Yes, only the routines that can explicitly use BLAS have multi-threading. > > PETSc does support using nay MPI linear solvers from a sequential (or OpenMP) main program using the https://urldefense.us/v3/__https://petsc.org/release/manualpages/PC/PCMPI/*pcmpi__;Iw!!G_uCfscf7eWS!fEO7IZL3jQV-DOjKPiTVpQNQTTK1IngytmxtSmu5Zqk9hiDabFw2LqAQ93_tN8hG1wEk886ENFA3bM-EGAKIuSI$ construct. I am finishing up better support in the branch barry/2023-09-15/fix-log-pcmpi > > Barry > > > > > > > > On Apr 23, 2024, at 3:59?PM, Yongzhong Li > wrote: > > Thanks Barry! Does this mean that the sparse matrix-vector products, which actually constitute the majority of the computations in my GMRES routine in PETSc, don?t utilize multithreading? Only basic vector operations such as VecAXPY and VecDot or dense matrix operations in PETSc will benefit from multithreading, is it correct? > > Best, > Yongzhong > > From: Barry Smith > > Date: Tuesday, April 23, 2024 at 3:35?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > Intel MKL or OpenBLAS are the best bet, but for vector operations they will not be significant since the vector operations do not dominate the computations. > > > On Apr 23, 2024, at 3:23?PM, Yongzhong Li > wrote: > > Hi Barry, > > Thank you for the information provided! > > Do you think different BLAS implementation will affect the multithreading performance of some vectors operations in GMERS in PETSc? > > I am now using OpenBLAS but didn?t see much improvement when theb multithreading are enabled, do you think other implementation such as netlib and intel-mkl will help? > > Best, > Yongzhong > > From: Barry Smith > > Date: Monday, April 22, 2024 at 4:20?PM > To: Yongzhong Li > > Cc: petsc-users at mcs.anl.gov >, petsc-maint at mcs.anl.gov >, Piero Triverio > > Subject: Re: [petsc-maint] Inquiry about Multithreading Capabilities in PETSc's KSPSolver > > ????????? bsmith at petsc.dev ????????????????? > > PETSc provided solvers do not directly use threads. > > The BLAS used by LAPACK and PETSc may use threads depending on what BLAS is being used and how it was configured. > > Some of the vector operations in GMRES in PETSc use BLAS that can use threads, including axpy, dot, etc. For sufficiently large problems, the use of threaded BLAS can help with these routines, but not significantly for the solver. > > Dense matrix-vector products MatMult() and dense matrix direct solvers PCLU use BLAS and thus can benefit from threading. The benefit can be significant for large enough problems with good hardware, especially with PCLU. > > If you run with -blas_view PETSc tries to indicate information about the threading of BLAS. You can also use -blas_num_threads to set the number of threads, equivalent to setting the environmental variable. For dense solvers you can vary the number of threads and run with -log_view to see what it helps to improve and what it does not effect. > > > > > On Apr 22, 2024, at 4:06?PM, Yongzhong Li > wrote: > > This Message Is From an External Sender > This message came from outside your organization. > Hello all, > > I am writing to ask if PETSc?s KSPSolver makes use of OpenMP/multithreading, specifically when performing iterative solutions with the GMRES algorithm. > > The questions appeared when I was running a large numerical program based on boundary element method. I used the PETSc's GMRES algorithm in KSPSolve to solve a shell matrix system iteratively. I observed that threads were being utilized, controlled by the OPENBLAS_NUM_THREADS environment variable. However, I noticed no significant performance difference between running the solver with multiple threads versus a single thread. > > Could you please confirm if GMRES in KSPSolve leverages multithreading, and also whether it is influenced by the multithreadings of the low-level math libraries such as BLAS and LAPACK? If so, how can I enable multithreading effectively to see noticeable improvements in solution times when using GMRES? If not, why do I observe that threads are being used during the GMERS solutions? > > For reference, I am using PETSc version 3.16.0, configured in CMakelists as follows: > > ./configure PETSC_ARCH=config-release --with-scalar-type=complex --with-fortran-kernels=1 --with-debugging=0 COPTFLAGS=-O3 -march=native CXXOPTFLAGS=-O3 -march=native FOPTFLAGS=-O3 -march=native --with-cxx=g++ --download-openmpi --download-superlu --download-opencascade --with-openblas-include=${OPENBLAS_INC} --with-openblas-lib=${OPENBLAS_LIB} --with-threadsafety --with-log=0 --with-openmp > > To simplify the diagnosis of potential issues, I have also written a small example program using GMRES to solve a sparse matrix system derived from a 2D Poisson problem using the finite difference method. I found similar issues on this piece of codes. The code is as follows: > > #include > > /* Monitor function to print iteration number and residual norm */ > PetscErrorCode MyKSPMonitor(KSP ksp, PetscInt n, PetscReal rnorm, void *ctx) { > PetscErrorCode ierr; > ierr = PetscPrintf(PETSC_COMM_WORLD, "Iteration %D, Residual norm %g\n", n, (double)rnorm); > CHKERRQ(ierr); > return 0; > } > > int main(int argc, char **args) { > Vec x, b, x_true, e; > Mat A; > KSP ksp; > PetscErrorCode ierr; > PetscInt i, j, Ii, J, n = 500; // Size of the grid n x n > PetscInt Istart, Iend, ncols; > PetscScalar v; > PetscMPIInt rank; > PetscInitialize(&argc, &args, NULL, NULL); > PetscLogDouble t1, t2; // Variables for timing > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > // Create vectors and matrix > ierr = VecCreateMPI(PETSC_COMM_WORLD, PETSC_DECIDE, n*n, &x); CHKERRQ(ierr); > ierr = VecDuplicate(x, &b); CHKERRQ(ierr); > ierr = VecDuplicate(x, &x_true); CHKERRQ(ierr); > > // Set true solution as all ones > ierr = VecSet(x_true, 1.0); CHKERRQ(ierr); > > // Create and assemble matrix A for the 2D Laplacian using 5-point stencil > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, n*n, n*n); CHKERRQ(ierr); > ierr = MatSetFromOptions(A); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatGetOwnershipRange(A, &Istart, &Iend); CHKERRQ(ierr); > for (Ii = Istart; Ii < Iend; Ii++) { > i = Ii / n; // Row index > j = Ii % n; // Column index > v = -4.0; > ierr = MatSetValues(A, 1, &Ii, 1, &Ii, &v, INSERT_VALUES); CHKERRQ(ierr); > if (i > 0) { // South > J = Ii - n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (i < n - 1) { // North > J = Ii + n; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j > 0) { // West > J = Ii - 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > if (j < n - 1) { // East > J = Ii + 1; > v = 1.0; > ierr = MatSetValues(A, 1, &Ii, 1, &J, &v, INSERT_VALUES); CHKERRQ(ierr); > } > } > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > // Compute the RHS corresponding to the true solution > ierr = MatMult(A, x_true, b); CHKERRQ(ierr); > > // Set up and solve the linear system > ierr = KSPCreate(PETSC_COMM_WORLD, &ksp); CHKERRQ(ierr); > ierr = KSPSetOperators(ksp, A, A); CHKERRQ(ierr); > ierr = KSPSetType(ksp, KSPGMRES); CHKERRQ(ierr); > ierr = KSPSetTolerances(ksp, 1e-5, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT); CHKERRQ(ierr); > > /* Set up the monitor */ > ierr = KSPMonitorSet(ksp, MyKSPMonitor, NULL, NULL); CHKERRQ(ierr); > > // Start timing > PetscTime(&t1); > > ierr = KSPSolve(ksp, b, x); CHKERRQ(ierr); > > // Stop timing > PetscTime(&t2); > > // Compute error > ierr = VecDuplicate(x, &e); CHKERRQ(ierr); > ierr = VecWAXPY(e, -1.0, x_true, x); CHKERRQ(ierr); > PetscReal norm_error, norm_true; > ierr = VecNorm(e, NORM_2, &norm_error); CHKERRQ(ierr); > ierr = VecNorm(x_true, NORM_2, &norm_true); CHKERRQ(ierr); > PetscReal relative_error = norm_error / norm_true; > if (rank == 0) { // Print only from the first MPI process > PetscPrintf(PETSC_COMM_WORLD, "Relative error ||x - x_true||_2 / ||x_true||_2: %g\n", (double)relative_error); > } > > // Output the wall time taken for MatMult > PetscPrintf(PETSC_COMM_WORLD, "Time taken for KSPSolve: %f seconds\n", t2 - t1); > > // Cleanup > ierr = VecDestroy(&x); CHKERRQ(ierr); > ierr = VecDestroy(&b); CHKERRQ(ierr); > ierr = VecDestroy(&x_true); CHKERRQ(ierr); > ierr = VecDestroy(&e); CHKERRQ(ierr); > ierr = MatDestroy(&A); CHKERRQ(ierr); > ierr = KSPDestroy(&ksp); CHKERRQ(ierr); > PetscFinalize(); > return 0; > } > > Here are some profiling results for GMERS solution. > > OPENBLAS_NUM_THREADS = 1, iteration steps = 859, solution time = 16.1 > OPENBLAS_NUM_THREADS = 2, iteration steps = 859, solution time = 16.3 > OPENBLAS_NUM_THREADS = 4, iteration steps = 859, solution time = 16.7 > OPENBLAS_NUM_THREADS = 8, iteration steps = 859, solution time = 16.8 > OPENBLAS_NUM_THREADS = 16, iteration steps = 859, solution time = 17.8 > > I am using one workstation with Intel? Core? i9-11900K Processor, 8 cores, 16 threads. Note that I am not using multiple MPI processes, such as mpirun/mpiexec, the default number of MPI processes should be 1, correct if I am wrong. > > Thank you in advance! > > Sincerely, > Yongzhong > > ----------------------------------------------------------- > Yongzhong Li > PhD student | Electromagnetics Group > Department of Electrical & Computer Engineering > University of Toronto > https://urldefense.us/v3/__http://www.modelics.org__;!!G_uCfscf7eWS!fEO7IZL3jQV-DOjKPiTVpQNQTTK1IngytmxtSmu5Zqk9hiDabFw2LqAQ93_tN8hG1wEk886ENFA3bM-EQf0srRU$ -------------- next part -------------- An HTML attachment was scrubbed... URL: