From daniel.stone at opengosim.com Tue Aug 1 04:12:06 2023 From: daniel.stone at opengosim.com (Daniel Stone) Date: Tue, 1 Aug 2023 10:12:06 +0100 Subject: [petsc-users] Confusion/failures about the tests involved in including Hypre In-Reply-To: References: <5CF63688-6351-4EA4-9737-7A44A63C061C@petsc.dev> <48e0971e-6211-7d00-dd2e-a965b21a4e22@mcs.anl.gov> <7F01C428-49CE-4E1E-AB72-C2725E585CEB@petsc.dev> <48453d37-48df-6c40-c55f-28bd06fc7486@mcs.anl.gov> <4725AED4-FFFC-4DCD-85CD-6A6E96380523@petsc.dev> <3D80FBFC-58A1-43E5-AD3D-9D18790796BB@petsc.dev> <33006ce7-4851-6d86-6a7a-c3efe79415a3@mcs.anl.gov> <50D7E299-728F-47A4-B098-C58E3FA6C679@petsc.dev> Message-ID: This still isn't working, and it looks like the md/mt distinciton might be the culprit. I use the Ninja generator with cmake, on the intel oneapi command line (i.e. windows command line but with the pathing and other things sorted out for the new intel compilers). Looking at the build.ninja file produced, I see rules like: ------------------- build CMakeFiles\HYPRE.dir\blas\dasum.c.obj: C_COMPILER__HYPRE_Debug ..\blas\dasum.c || cmake_object_order_depends_target_HYPRE DEFINES = -D_CRT_SECURE_NO_WARNINGS DEP_FILE = CMakeFiles\HYPRE.dir\blas\dasum.c.obj.d FLAGS = /DWIN32 /D_WINDOWS /Zi /Ob0 /Od /RTC1 -MDd INCLUDES = [etc....] ------------------ Note the -MDd flag (which is an -MD flag if I ask for a non debug build). It's odd that it isn't an /MD flag, but I've done some quick checks on the oneapi command line and it looks like -MD is still interpreted as valid input by icx (e.g., >> icx /MD and >> icx -MD both only complain about lack of input files, while >> icx /MDblahblahnotarealflag and >> icx -MDblahblahnotarealflag both give "unknown argument" errors). Thus it really seems like my Hypre library has been building with the MD option, although MT is the default ( https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-0/md.html and https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-0/mt.html), as suggested by Satish. There seems to be no hint of the MD flag in any of the cmake files provided with hypre, so it looks like the -MD flag is being added by the cmake Ninja generator as a kind of default (and maybe in a confused way - why the - instead of /?) So the task now seems to be to convince cmake to stop doing this. On Mon, Jul 24, 2023 at 11:21?AM Daniel Stone wrote: > On the hypre versioning - aha. For this project I locked the petsc version > a little while ago (3.19.1), but I've been using a fresh clone of hypre, so > clearly > it's too modern a version. Using the appropriate version of hypre (2.28.0, > according to hypre.py) might fix some things. > I may have other problems in the form of the confilicting compiler options > as Satish mentioned, which I'll have to figure out too. > > Thanks, > > Daniel > > On Fri, Jul 21, 2023 at 9:32?PM Barry Smith wrote: > >> >> >> hypre_Error was changed from an integer to a struct in hypre. PETSc code >> was changed in main to work with that struct and not work if hypre_Error is >> an int. This means main cannot work with previous hypre versions (where >> hypre_Error is an int). Sure, instead of changing the minimum version one >> could potentially change PETSc's main version source code to work with both >> old and new hypre, but that would need to be done and has not been done. >> main is simply broken, it allows configure to succeed with an older version >> of hypre but PETSc main cannot compile with this, thus confusing developers >> and users. This does need to be fixed, one way or another. >> >> >> >> > On Jul 21, 2023, at 1:32 PM, Satish Balay wrote: >> > >> > We don't have a good way to verify if the changes continue to work >> > with the minver of the pkg [and if minor fixes can get this working - >> > without increasing this requirement - sure without access to the new >> > features provided by the new version] >> > >> > Having a single version dependency between pkgs makes the whole >> > ecosystem [where many pkgs with all their dependencies mixed in] >> > brittle.. >> > >> > Satish >> > >> > On Thu, 20 Jul 2023, Barry Smith wrote: >> > >> >> >> >> Satish, >> >> >> >> I think hypre.py in main needs minversion = 2.29 instead of 2.14 >> >> >> >> >> >> >> >>> On Jul 20, 2023, at 11:05 AM, Satish Balay wrote: >> >>> >> >>> Can check config/BuildSystem/config/packages/hypre.py >> >>> >> >>> petsc-3.19 (or release branch) is compatible with hypre 2.28.0, petsc >> 'main' branch with 2.29.0 >> >>> >> >>> Satish >> >>> >> >>> On Thu, 20 Jul 2023, Barry Smith via petsc-users wrote: >> >>> >> >>>> >> >>>> You cannot use this version of PETSc, 3.19, with the version of >> hypre you installed. In hypre they recently changed hypre_Error from an >> integer to a struct which completely breaks compatibility with previous >> versions of hypre (and hence previous versions of PETSc). You must use the >> main git branch of PETSc with the version of hypre you installed. >> >>>> >> >>>> Barry >> >>>> >> >>>> >> >>>>> On Jul 20, 2023, at 5:10 AM, Daniel Stone < >> daniel.stone at opengosim.com> wrote: >> >>>>> >> >>>>> Hi All, >> >>>>> >> >>>>> Many thanks for the detailed explainations and ideas! >> >>>>> >> >>>>> I tried skipping the test. When it came time to do the build itself >> (make $PETSC_DIR... all) I get some failures, unsurprisingly: >> >>>>> >> >>>>> -------------------------------- >> >>>>> >> >>>>> FC arch-mswin-c-opt/obj/dm/f90-mod/petscdmplexmod.o >> >>>>> CC >> arch-mswin-c-opt/obj/ksp/pc/impls/hypre/ftn-custom/zhypref.o >> >>>>> CC arch-mswin-c-opt/obj/ksp/pc/impls/hypre/ftn-auto/hypref.o >> >>>>> CC arch-mswin-c-opt/obj/ksp/pc/impls/hypre/hypre.o >> >>>>> >> C:\cygwin64\home\DANIEL~1\PETSC_~1.1\src\ksp\pc\impls\hypre\hypre.c(444,29): >> error: assigning to 'hypre_Error' from incompatible type 'int' >> >>>>> hypre__global_error = 0; >> >>>>> ^ ~ >> >>>>> C:\cygwin64\home\DANIEL~1\PETSC_~1.1\include\petscerror.h(1752,7): >> note: expanded from macro 'PetscStackCallExternalVoid' >> >>>>> __VA_ARGS__; \ >> >>>>> ^~~~~~~~~~~ >> >>>>> >> C:\cygwin64\home\DANIEL~1\PETSC_~1.1\src\ksp\pc\impls\hypre\hypre.c(634,29): >> error: assigning to 'hypre_Error' from incompatible type 'int' >> >>>>> hypre__global_error = 0; >> >>>>> ^ ~ >> >>>>> C:\cygwin64\home\DANIEL~1\PETSC_~1.1\include\petscerror.h(1752,7): >> note: expanded from macro 'PetscStackCallExternalVoid' >> >>>>> __VA_ARGS__; \ >> >>>>> ^~~~~~~~~~~ >> >>>>> 2 errors generated. >> >>>>> make[3]: *** [gmakefile:195: >> arch-mswin-c-opt/obj/ksp/pc/impls/hypre/hypre.o] Error 1 >> >>>>> make[3]: *** Waiting for unfinished jobs.... >> >>>>> FC arch-mswin-c-opt/obj/ksp/f90-mod/petsckspdefmod.o >> >>>>> CC arch-mswin-c-opt/obj/dm/impls/da/hypre/mhyp.o >> >>>>> CC arch-mswin-c-opt/obj/mat/impls/hypre/mhypre.o >> >>>>> make[3]: Leaving directory '/home/DanielOGS/petsc_ogs_3.19.1' >> >>>>> make[2]: *** >> [/home/DanielOGS/petsc_ogs_3.19.1/lib/petsc/conf/rules.doc:28: libs] Error 2 >> >>>>> make[2]: Leaving directory '/home/DanielOGS/petsc_ogs_3.19.1' >> >>>>> **************************ERROR************************************* >> >>>>> Error during compile, check arch-mswin-c-opt/lib/petsc/conf/make.log >> >>>>> Send it and arch-mswin-c-opt/lib/petsc/conf/configure.log to >> petsc-maint at mcs.anl.gov >> >>>>> ******************************************************************** >> >>>>> Finishing make run at Wed, 19 Jul 2023 17:07:00 +0100 >> >>>>> >> >>>>> ---------------------------------------- >> >>>>> >> >>>>> But wait - isn't this the compile stage, not the linking stage? >> This seems to imply that I've made a hash of providing include file such >> that a definition of "hypre_Error" >> >>>>> cannot be seen - unless I'm misinterpreting. Interesting note about >> Hypre and include files - if built using configure and make, all the >> include files are conviniently copied >> >>>>> into hypre/src/hypre/include. This is not done for a cmake build - >> I had to do the copying myself. Maybe I missed one. >> >>>>> >> >>>>> >> >>>>> On shared vs. static - if there a clear way of telling which I've >> ended up with? I've checked the cmakelists for hypre and this seems to >> imply that not-shared is the default, >> >>>>> which I didn't change: >> >>>>> >> >>>>> # Configuration options >> >>>>> option(HYPRE_ENABLE_SHARED "Build a shared library" OFF) >> >>>>> option(HYPRE_ENABLE_BIGINT "Use long long int for >> HYPRE_Int" OFF) >> >>>>> option(HYPRE_ENABLE_MIXEDINT "Use long long int for >> HYPRE_BigInt, int for HYPRE_INT" OFF) >> >>>>> [....] >> >>>>> >> >>>>> >> >>>>> checking again, I've noticed that the way that the stub-test fails >> is different depending on whether it's called from the config script or >> used in isolation - more details on that soon. >> >>>>> >> >>>>> >> >>>>> >> >>>>> Thanks again, >> >>>>> >> >>>>> Daniel >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Wed, Jul 19, 2023 at 4:58?PM Satish Balay via petsc-users < >> petsc-users at mcs.anl.gov > wrote: >> >>>>>> I think it should work with static libraries and 64bit compilers. >> >>>>>> >> >>>>>> That's how I think --download-f2cblaslapack [etc] work. >> >>>>>> >> >>>>>> Also it works with MS-MPI - even-though its a dll install, the >> library stub provides this symbol somehow.. >> >>>>>> >> >>>>>> balay at ps5 /cygdrive/c/Program Files (x86)/Microsoft >> SDKs/MPI/Lib/x64 >> >>>>>> $ nm -Ao msmpi.lib |grep " MPI_Init" >> >>>>>> msmpi.lib:msmpi.dll:0000000000000000 T MPI_Init >> >>>>>> msmpi.lib:msmpi.dll:0000000000000000 T MPI_Init_thread >> >>>>>> msmpi.lib:msmpi.dll:0000000000000000 T MPI_Initialized >> >>>>>> >> >>>>>> However - if the library symbol is somehow mangled - this >> configure mode of checking library functions will fail. >> >>>>>> >> >>>>>> Checking PETSc dll build: >> >>>>>> >> >>>>>> balay at ps5 ~/petsc/arch-ci-mswin-uni/lib >> >>>>>> $ nm -Ao libpetsc.lib |grep MatCreateSeqAIJWithArrays >> >>>>>> libpetsc.lib:libpetsc.dll:0000000000000000 I >> __imp_MatCreateSeqAIJWithArrays >> >>>>>> libpetsc.lib:libpetsc.dll:0000000000000000 T >> MatCreateSeqAIJWithArrays >> >>>>>> >> >>>>>> It also has the unmangled symbol - so I guess this mode can work >> generally with dlls. >> >>>>>> >> >>>>>> Satish >> >>>>>> >> >>>>>> >> >>>>>> On Wed, 19 Jul 2023, Barry Smith wrote: >> >>>>>> >> >>>>>>> >> >>>>>>> Satish, >> >>>>>>> >> >>>>>>> So it will always fail on Windows with Windows compilers (both >> with static and shared libraries)? Is this true for all PETSc external >> packages? If so, why does the installation documentation say that some >> external packages can work with Windows compilers? (Presumably PETSc cannot >> since the configure tests will fail). >> >>>>>>> >> >>>>>>> Barry >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Jul 19, 2023, at 11:40 AM, Satish Balay > > wrote: >> >>>>>>>> >> >>>>>>>> BTW: Some explanation of configure: >> >>>>>>>> >> >>>>>>>> It attempts the following on linux: >> >>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>> Source: >> >>>>>>>> #include "confdefs.h" >> >>>>>>>> #include "conffix.h" >> >>>>>>>> /* Override any gcc2 internal prototype to avoid an error. */ >> >>>>>>>> char HYPRE_IJMatrixCreate(); >> >>>>>>>> static void _check_HYPRE_IJMatrixCreate() { >> HYPRE_IJMatrixCreate(); } >> >>>>>>>> >> >>>>>>>> int main(void) { >> >>>>>>>> _check_HYPRE_IJMatrixCreate(); >> >>>>>>>> return 0; >> >>>>>>>> } >> >>>>>>>> <<<<<<< >> >>>>>>>> >> >>>>>>>> Note - it does not include 'HYPRE.h' here - but redefines the >> prototype as 'char HYPRE_IJMatrixCreate(); >> >>>>>>>> >> >>>>>>>> Compiling it manually: >> >>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>> [balay at pj01 petsc]$ cat conftest.c >> >>>>>>>> char HYPRE_IJMatrixCreate(); >> >>>>>>>> static void _check_HYPRE_IJMatrixCreate() { >> HYPRE_IJMatrixCreate(); } >> >>>>>>>> >> >>>>>>>> int main(void) { >> >>>>>>>> _check_HYPRE_IJMatrixCreate(); >> >>>>>>>> return 0; >> >>>>>>>> } >> >>>>>>>> [balay at pj01 petsc]$ gcc -c conftest.c >> >>>>>>>> [balay at pj01 petsc]$ nm -Ao conftest.o |grep HYPRE_IJMatrixCreate >> >>>>>>>> conftest.o:0000000000000000 t _check_HYPRE_IJMatrixCreate >> >>>>>>>> conftest.o: U HYPRE_IJMatrixCreate >> >>>>>>>> [balay at pj01 petsc]$ nm -Ao arch-linux-c-debug/lib/libHYPRE.so >> |grep HYPRE_IJMatrixCreate >> >>>>>>>> arch-linux-c-debug/lib/libHYPRE.so:000000000007f2c2 T >> HYPRE_IJMatrixCreate >> >>>>>>>> [balay at pj01 petsc]$ >> >>>>>>>> <<<< >> >>>>>>>> >> >>>>>>>> Here the "U HYPRE_IJMatrixCreate" in conftest.o matches "T >> HYPRE_IJMatrixCreate" in libHYPRE.so - so the "link" test in configure >> succeeds! >> >>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>> [balay at pj01 petsc]$ gcc -o conftest conftest.o >> arch-linux-c-debug/lib/libHYPRE.so >> >>>>>>>> [balay at pj01 petsc]$ echo $? >> >>>>>>>> 0 >> >>>>>>>> <<<<< >> >>>>>>>> >> >>>>>>>> On windows - [due to name mangling by cdecl/stdcall, (/MT vs >> /MD) etc..] - this might not match - resulting in link failures. >> >>>>>>>> >> >>>>>>>> Satish >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Wed, 19 Jul 2023, Satish Balay via petsc-users wrote: >> >>>>>>>> >> >>>>>>>>> You could try skipping this test [and assume >> --with-hypre-include and --with-hypre-lib options are correct] - and see if >> this works. >> >>>>>>>>> >> >>>>>>>>> diff --git a/config/BuildSystem/config/packages/hypre.py >> b/config/BuildSystem/config/packages/hypre.py >> >>>>>>>>> index 5bc88322aa2..2d6c7932e17 100644 >> >>>>>>>>> --- a/config/BuildSystem/config/packages/hypre.py >> >>>>>>>>> +++ b/config/BuildSystem/config/packages/hypre.py >> >>>>>>>>> @@ -11,7 +11,7 @@ class Configure(config.package.GNUPackage): >> >>>>>>>>> self.requiresversion = 1 >> >>>>>>>>> self.gitcommit = 'v'+self.version >> >>>>>>>>> self.download = ['git:// >> https://github.com/hypre-space/hypre',' >> https://github.com/hypre-space/hypre/archive/'+self.gitcommit+'.tar.gz'] >> >>>>>>>>> - self.functions = ['HYPRE_IJMatrixCreate'] >> >>>>>>>>> + self.functions = [] >> >>>>>>>>> self.includes = ['HYPRE.h'] >> >>>>>>>>> self.liblist = [['libHYPRE.a']] >> >>>>>>>>> self.buildLanguages = ['C','Cxx'] >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Satish >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Wed, 19 Jul 2023, Barry Smith wrote: >> >>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> You don't indicate what type of libraries you built hypre >> with; static or shared. My guess is you ended up with shared >> >>>>>>>>>> >> >>>>>>>>>> I think the answer to your difficulty is hidden in __cdecl >> (Satish will know much better than me). When you are looking for symbols in >> Windows shared libraries you have to prepend something to the function >> prototype to have it successfully found. For example the PETSc include >> files have these things __declspec(dllimport) The configure test fails >> because it does not provide the needed prototype. Likely you built PTScotch >> with static libraries so no problem. >> >>>>>>>>>> >> >>>>>>>>>> The simplest fix would be to build static hypre libraries. I >> think it is a major project to get PETSc configure and macro system to work >> properly with external packages that are in Windows shared libraries since >> more use of __declspec would be needed. >> >>>>>>>>>> >> >>>>>>>>>> Barry >> >>>>>>>>>> >> >>>>>>>>>> The PETSc installation instructions should probably say >> something about external packages with Windows shared libraries. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> On Jul 19, 2023, at 10:52 AM, Daniel Stone < >> daniel.stone at opengosim.com > wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> Hello, >> >>>>>>>>>>> >> >>>>>>>>>>> I'm working on getting a petsc build running on windows. One >> necessary package to include is Hypre. I've been able to build Hypre >> seperately using cmake, and confirmed that the library works >> >>>>>>>>>>> by setting up a VS project to run some of the example >> programs. >> >>>>>>>>>>> >> >>>>>>>>>>> My attempted petsc build is being done through cygwin. I've >> been able to (with varying degrees of difficulty), build a fairly plain >> petsc, and one that downloads and builds ptscotch (after some modifications >> >>>>>>>>>>> to both ptscotch and the config script). I am now attempting >> to include Hypre (using the --hypre-iclude and --hypre-lib flags, etc). >> Note that the same compilers are being used for both Hypre and for petsc >> >>>>>>>>>>> through cygwin - the new intel oneapi compilers (icx and ifx, >> after again varying amounts of pain to work around their awkwardness with >> the config script). >> >>>>>>>>>>> >> >>>>>>>>>>> I'm seeing a problem when the config script does some tests >> on the included hypre lib. The source code looks like: >> >>>>>>>>>>> >> >>>>>>>>>>> #include "confdefs.h" >> >>>>>>>>>>> #include "conffix.h" >> >>>>>>>>>>> /* Override any gcc2 internal prototype to avoid an error. */ >> >>>>>>>>>>> >> >>>>>>>>>>> #include "HYPRE.h" >> >>>>>>>>>>> >> >>>>>>>>>>> char HYPRE_IJMatrixCreate(); >> >>>>>>>>>>> static void _check_HYPRE_IJMatrixCreate() { >> HYPRE_IJMatrixCreate(); } >> >>>>>>>>>>> >> >>>>>>>>>>> int main() { >> >>>>>>>>>>> _check_HYPRE_IJMatrixCreate();; >> >>>>>>>>>>> return 0; >> >>>>>>>>>>> } >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> As I understand this is a fairly standard type of stub >> program used by the config script to check that it is able to link to >> certain symbols in given libraries. Tests like this have succeeded in my >> builds that >> >>>>>>>>>>> include PTScotch. >> >>>>>>>>>>> >> >>>>>>>>>>> I keep getting a linker error with the above test, including >> if I seperate it out and try to build it seperately: >> >>>>>>>>>>> >> >>>>>>>>>>> unresolved external symbol "char __cdel >> HYPRE_IJMatrixCreate(void)" .... >> >>>>>>>>>>> >> >>>>>>>>>>> Ok, it looks like a problem with either the library or linker >> commands. But here's the interesting thing - If I transplant this code into >> VS, with the same project setting that allows it to build the much more >> >>>>>>>>>>> nontrivial Hypre example programs, I get the same error: >> >>>>>>>>>>> >> >>>>>>>>>>> Error LNK2001 unresolved external symbol "char __cdecl >> HYPRE_IJMatrixCreate(void)" (?HYPRE_IJMatrixCreate@@YADXZ) hypretry1 >> C:\Users\DanielOGS\source\repos\hypretry1\hypretry1\Source.obj 1 >> >>>>>>>>>>> >> >>>>>>>>>>> So it seems like there might be something about this type of >> stub program that is not working with my Hypre library. I don't fully >> understand this program - it's able to call the function with no arguments, >> but >> >>>>>>>>>>> it also needs to be linked against a library containing the >> function, apparently by wrapping it in a static void function? Not >> something I've seen before. >> >>>>>>>>>>> >> >>>>>>>>>>> Does anyone have any insight into what might be going wrong - >> or really just any explaination of how the stub program works so I can >> figure out why it isn't in this case? >> >>>>>>>>>>> >> >>>>>>>>>>> Many thanks, >> >>>>>>>>>>> >> >>>>>>>>>>> Daniel >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>> >> >>>> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 1 09:23:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 1 Aug 2023 10:23:05 -0400 Subject: [petsc-users] PETSc :: FEM Help In-Reply-To: References: Message-ID: Sorry about this. I signed up for a conference without the work done, with predictable results. I have just returned home. There were just a few small problems. First, the labels were attached to dmSurface, but you wanted them on dm. They got destroyed with dmSurface before setting the BC. Second, the declarations of the point function were missing the constant arguments. Third, the PetscFEDestroy() was missing and extra DM creations were there. I have fixed these and am attaching the new source. It runs for me but I have not checked the answer. Thanks, Matt On Wed, Jun 7, 2023 at 11:05?AM Brandon Denton via petsc-users < petsc-users at mcs.anl.gov> wrote: > Good Morning, > > I'm trying to verify that the CAD -> PETSc/DMPlex methods I've developed > can be used for FEM analyses using PETSc. Attached is my current attempt > where I import a CAD STEP file to create a volumetric tetrahedral > discretization (DMPlex), designate boundary condition points using > DMLabels, and solve the Laplace problem (heat) with Dirichlet conditions on > each end. At command line I indicate the STEP file with the -filename > option and the dual space degree with -petscspace_degree 2. The run ends > with either a SEGV Fault or a General MPI Communication Error. > > Could you please look over the attached file to tell me if what I'm doing > to set up the FEM problem is wrong? > > Thank you in advance for your time and help. > -Brandon > > TYPICAL ERROR MESSAGE > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: General MPI error > [0]PETSC ERROR: MPI error 605109765 Invalid communicator, error stack: > PMPI_Comm_get_attr(344): MPI_Comm_get_attr(comm=0x0, > comm_keyval=-1539309568, attribute_val=0x7ffe75a58848, flag=0x7ffe75a58844) > failed > MPII_Comm_get_attr(257): MPIR_Comm_get_attr(comm=0x0, > comm_keyval=-1539309568, attribute_val=0x7ffe75a58848, flag=0x7ffe75a58844) > failed > MPII_Comm_get_attr(53).: Invalid communicator > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-dm_plex_refine_without_snap_to_geom > value: 0 source: command line > [0]PETSC ERROR: Option left: name:-dm_refine value: 1 source: command > line > [0]PETSC ERROR: Option left: name:-snes_monitor (no value) source: > command line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.5-1817-gd2497b8de4c > GIT Date: 2023-05-22 18:44:03 +0000 > [0]PETSC ERROR: ./thermal on a named XPS. by bdenton Wed Jun 7 11:03:43 > 2023 > [0]PETSC ERROR: Configure options --with-make-np=16 > --prefix=/mnt/c/Users/Brandon/software/libs/petsc/3.19.1-gitlab/gcc/11.2.0/mpich/3.4.2/openblas/0.3.17/opt > --with-debugging=false --COPTFLAGS="-O3 -mavx" --CXXOPTFLAGS="-O3 -mavx" > --FOPTFLAGS=-O3 --with-shared-libraries=1 > --with-mpi-dir=/mnt/c/Users/Brandon/software/libs/mpich/3.4.2/gcc/11.2.0 > --with-mumps=true --download-mumps=1 --with-metis=true --download-metis=1 > --with-parmetis=true --download-parmetis=1 --with-superlu=true > --download-superlu=1 --with-superludir=true --download-superlu_dist=1 > --with-blacs=true --download-blacs=1 --with-scalapack=true > --download-scalapack=1 --with-hypre=true --download-hypre=1 > --with-hdf5-dir=/mnt/c/Users/Brandon/software/libs/hdf5/1.12.1/gcc/11.2.0 > --with-valgrind-dir=/mnt/c/Users/Brandon/software/apps/valgrind/3.14.0 > --with-blas-lib="[/mnt/c/Users/Brandon/software/libs/openblas/0.3.17/gcc/11.2.0/lib/libopenblas.so]" > --with-lapack-lib="[/mnt/c/Users/Brandon/software/libs/openblas/0.3.17/gcc/11.2.0/lib/libopenblas.so]" > --LDFLAGS= --with-tetgen=true --download-tetgen=1 --download-ctetgen=1 > --download-opencascade=1 --download-egads > [0]PETSC ERROR: #1 PetscObjectName() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/sys/objects/pname.c:119 > [0]PETSC ERROR: #2 PetscObjectGetName() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/sys/objects/pgname.c:27 > [0]PETSC ERROR: #3 PetscDSAddBoundary() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/dm/dt/interface/dtds.c:3404 > [0]PETSC ERROR: #4 DMAddBoundary() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/dm/interface/dm.c:7828 > [0]PETSC ERROR: #5 main() at > /mnt/c/Users/Brandon/Documents/School/Dissertation/Software/EGADS-dev/thermal_v319/thermal_nozzle.c:173 > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -dm_plex_geom_print_model 1 (source: command line) > [0]PETSC ERROR: -dm_plex_geom_shape_opt 0 (source: command line) > [0]PETSC ERROR: -dm_plex_refine_without_snap_to_geom 0 (source: command > line) > [0]PETSC ERROR: -dm_refine 1 (source: command line) > [0]PETSC ERROR: -filename ./examples/Nozzle_example.stp (source: command > line) > [0]PETSC ERROR: -petscspace_degree 2 (source: command line) > [0]PETSC ERROR: -snes_monitor (source: command line) > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 98) - process 0 > [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=98 > : > system msg for write_line failure : Bad file descriptor > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: thermal_nozzle_new.c Type: application/octet-stream Size: 9484 bytes Desc: not available URL: From maitri.ksh at gmail.com Tue Aug 1 13:30:11 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Tue, 1 Aug 2023 21:30:11 +0300 Subject: [petsc-users] compiler related error (configuring Petsc) Message-ID: I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' *configure.log'* is attached herewith) . Initially I got an error related to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and 'CXXFLAGS' and could surpass the non-compliant error related to c++ compiler but now it gives me another error 'cannot find a C preprocessor'. How to fix this? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 105772 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Aug 1 13:42:52 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 1 Aug 2023 13:42:52 -0500 (CDT) Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: References: Message-ID: > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) Is it possible for you to use a newer version GNU compilers? If not - your alternative is to build PETSc with --with-cxx=0 option But then - you can't use --download-superlu_dist or any pkgs that need c++ [you could try building them separately though] Satish On Tue, 1 Aug 2023, maitri ksh wrote: > I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' > *configure.log'* is attached herewith) . Initially I got an error related > to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and > 'CXXFLAGS' and could surpass the non-compliant error related to c++ compiler > but now it gives me another error 'cannot find a C preprocessor'. How to > fix this? > From jacob.fai at gmail.com Tue Aug 1 14:36:36 2023 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Tue, 1 Aug 2023 15:36:36 -0400 Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: References: Message-ID: <88B4B30A-1778-4681-97F2-59E167C390C4@gmail.com> >> Initially I got an error related >> to 'C++11' flag, Can you send the configure.log for this as well Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 1, 2023, at 14:42, Satish Balay via petsc-users wrote: > >> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) > > Is it possible for you to use a newer version GNU compilers? > > If not - your alternative is to build PETSc with --with-cxx=0 option > > But then - you can't use --download-superlu_dist or any pkgs that need > c++ [you could try building them separately though] > > Satish > > > On Tue, 1 Aug 2023, maitri ksh wrote: > >> I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' >> *configure.log'* is attached herewith) . Initially I got an error related >> to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and >> 'CXXFLAGS' and could surpass the non-compliant error related to c++ compiler >> but now it gives me another error 'cannot find a C preprocessor'. How to >> fix this? >> > From david at coreform.com Tue Aug 1 21:24:32 2023 From: david at coreform.com (David Kamensky) Date: Tue, 1 Aug 2023 19:24:32 -0700 Subject: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper Message-ID: Hi, My understanding is that the second-order generalized-alpha time stepper in PETSc uses a same-displacement predictor as the initial guess for the nonlinear solver that executes in each time step. I'd like to be able to set this to something else, to improve convergence. However, my (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` haven't worked out. Is there any way to set a custom predictor? Thanks, David Kamensky -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Tue Aug 1 23:42:38 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Wed, 2 Aug 2023 07:42:38 +0300 Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: <88B4B30A-1778-4681-97F2-59E167C390C4@gmail.com> References: <88B4B30A-1778-4681-97F2-59E167C390C4@gmail.com> Message-ID: sure, attached. On Tue, Aug 1, 2023 at 10:36?PM Jacob Faibussowitsch wrote: > >> Initially I got an error related > >> to 'C++11' flag, > > Can you send the configure.log for this as well > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Aug 1, 2023, at 14:42, Satish Balay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > >> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) > > > > Is it possible for you to use a newer version GNU compilers? > > > > If not - your alternative is to build PETSc with --with-cxx=0 option > > > > But then - you can't use --download-superlu_dist or any pkgs that need > > c++ [you could try building them separately though] > > > > Satish > > > > > > On Tue, 1 Aug 2023, maitri ksh wrote: > > > >> I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' > >> *configure.log'* is attached herewith) . Initially I got an error > related > >> to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and > >> 'CXXFLAGS' and could surpass the non-compliant error related to c++ > compiler > >> but now it gives me another error 'cannot find a C preprocessor'. How to > >> fix this? > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 137284 bytes Desc: not available URL: From pierre.jolivet at lip6.fr Wed Aug 2 00:27:14 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Wed, 2 Aug 2023 07:27:14 +0200 Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: References: <88B4B30A-1778-4681-97F2-59E167C390C4@gmail.com> Message-ID: Right, so configure did the proper job and told you that your compiler does not (fully) work with C++11, there is no point in trying to add extra flags to bypass this limitation. As Satish suggested: either use a newer g++ or configure --with-cxx=0 Thanks, Pierre > On 2 Aug 2023, at 6:42 AM, maitri ksh wrote: > > sure, attached. > > On Tue, Aug 1, 2023 at 10:36?PM Jacob Faibussowitsch > wrote: >> >> Initially I got an error related >> >> to 'C++11' flag, >> >> Can you send the configure.log for this as well >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >> > On Aug 1, 2023, at 14:42, Satish Balay via petsc-users > wrote: >> > >> >> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) >> > >> > Is it possible for you to use a newer version GNU compilers? >> > >> > If not - your alternative is to build PETSc with --with-cxx=0 option >> > >> > But then - you can't use --download-superlu_dist or any pkgs that need >> > c++ [you could try building them separately though] >> > >> > Satish >> > >> > >> > On Tue, 1 Aug 2023, maitri ksh wrote: >> > >> >> I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' >> >> *configure.log'* is attached herewith) . Initially I got an error related >> >> to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and >> >> 'CXXFLAGS' and could surpass the non-compliant error related to c++ compiler >> >> but now it gives me another error 'cannot find a C preprocessor'. How to >> >> fix this? >> >> >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Wed Aug 2 03:06:49 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Wed, 2 Aug 2023 11:06:49 +0300 Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: References: <88B4B30A-1778-4681-97F2-59E167C390C4@gmail.com> Message-ID: Ok, thanks for the clarification. On Wed, Aug 2, 2023 at 8:27?AM Pierre Jolivet wrote: > Right, so configure did the proper job and told you that your compiler > does not (fully) work with C++11, there is no point in trying to add extra > flags to bypass this limitation. > As Satish suggested: either use a newer g++ or configure --with-cxx=0 > > Thanks, > Pierre > > On 2 Aug 2023, at 6:42 AM, maitri ksh wrote: > > sure, attached. > > On Tue, Aug 1, 2023 at 10:36?PM Jacob Faibussowitsch > wrote: > >> >> Initially I got an error related >> >> to 'C++11' flag, >> >> Can you send the configure.log for this as well >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >> > On Aug 1, 2023, at 14:42, Satish Balay via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> >> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) >> > >> > Is it possible for you to use a newer version GNU compilers? >> > >> > If not - your alternative is to build PETSc with --with-cxx=0 option >> > >> > But then - you can't use --download-superlu_dist or any pkgs that need >> > c++ [you could try building them separately though] >> > >> > Satish >> > >> > >> > On Tue, 1 Aug 2023, maitri ksh wrote: >> > >> >> I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' >> >> *configure.log'* is attached herewith) . Initially I got an error >> related >> >> to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and >> >> 'CXXFLAGS' and could surpass the non-compliant error related to c++ >> compiler >> >> but now it gives me another error 'cannot find a C preprocessor'. How >> to >> >> fix this? >> >> >> > >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Wed Aug 2 03:07:13 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Wed, 2 Aug 2023 11:07:13 +0300 Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: References: Message-ID: Okay, thank you. On Tue, Aug 1, 2023 at 9:43?PM Satish Balay wrote: > > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) > > Is it possible for you to use a newer version GNU compilers? > > If not - your alternative is to build PETSc with --with-cxx=0 option > > But then - you can't use --download-superlu_dist or any pkgs that need > c++ [you could try building them separately though] > > Satish > > > On Tue, 1 Aug 2023, maitri ksh wrote: > > > I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' > > *configure.log'* is attached herewith) . Initially I got an error > related > > to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and > > 'CXXFLAGS' and could surpass the non-compliant error related to c++ > compiler > > but now it gives me another error 'cannot find a C preprocessor'. How to > > fix this? > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Wed Aug 2 08:22:44 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Wed, 2 Aug 2023 16:22:44 +0300 Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: References: Message-ID: I could compile petsc using the newer version of gnu compiler. However, there is some unusual error when I tried to test petsc using a print 'hello' file. I could not interpret what the error ('*error.txt*') is, but it says something related to MATLAB (which is not used in the ' *hello.c*' script). Any comments/suggestions? On Wed, Aug 2, 2023 at 11:07?AM maitri ksh wrote: > Okay, thank you. > > On Tue, Aug 1, 2023 at 9:43?PM Satish Balay wrote: > >> > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) >> >> Is it possible for you to use a newer version GNU compilers? >> >> If not - your alternative is to build PETSc with --with-cxx=0 option >> >> But then - you can't use --download-superlu_dist or any pkgs that need >> c++ [you could try building them separately though] >> >> Satish >> >> >> On Tue, 1 Aug 2023, maitri ksh wrote: >> >> > I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' >> > *configure.log'* is attached herewith) . Initially I got an error >> related >> > to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and >> > 'CXXFLAGS' and could surpass the non-compliant error related to c++ >> compiler >> > but now it gives me another error 'cannot find a C preprocessor'. How >> to >> > fix this? >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- /usr/bin/ld: warning: libut.so, needed by //usr/local/matlab/bin/glnxa64/libeng.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwengine_api.so, needed by //usr/local/matlab/bin/glnxa64/libeng.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwfl.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwservices.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwfoundation_matlabdata_matlab.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwfoundation_matlabdata.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwmvm.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwm_dispatcher_interfaces.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwm_dispatcher.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwi18n.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwfoundation_filesystem.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwmlutil.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwfoundation_usm.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwgenerate_diag_message.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwmcos.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwfoundation_log.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwcpp11compat.so, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwboost_chrono.so.1.75.0, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwboost_filesystem.so.1.75.0, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwboost_log.so.1.75.0, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwboost_thread.so.1.75.0, needed by //usr/local/matlab/bin/glnxa64/libmex.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwmfl_permute.so, needed by //usr/local/matlab/bin/glnxa64/libmx.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwindexingapimethods.so, needed by //usr/local/matlab/bin/glnxa64/libmx.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libicuuc.so.69, needed by //usr/local/matlab/bin/glnxa64/libmx.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwflstorageutility.so, needed by //usr/local/matlab/bin/glnxa64/libmat.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwflstoragevfs.so, needed by //usr/local/matlab/bin/glnxa64/libmat.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libmwcppmicroservices.so, needed by //usr/local/matlab/bin/glnxa64/libmat.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libCppMicroServices.so.3.7.0, needed by //usr/local/matlab/bin/glnxa64/libmat.so, not found (try using -rpath or -rpath-link) /usr/bin/ld: warning: libhdf5-1.8.so.8, needed by //usr/local/matlab/bin/glnxa64/libmat.so, not found (try using -rpath or -rpath-link) //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::Dispatcher::HasBuiltinName(foundation::core::except::IUserException const&)' //usr/local/matlab/bin/glnxa64/libeng.so: undefined reference to `utFree' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_ustring_to_wstring::do_get_destination_size(char16_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::isPackageFunction() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::System::Level::~Level()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::getClaimingComponent() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::MwLocale::global()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utTwoByteUIntConvert' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceReferenceBase::SetInterfaceId(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::diag::terminate(char const*, char const*, int, char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::Array(foundation::matlabdata::Array&&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mlutil::cmddistributor::EDIDataWrapper::EDIDataWrapper(mlutil::cmddistributor::EDIDataWrapper const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_DOUBLE_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::diag::terminate(std::__cxx11::basic_stringstream, std::allocator > const&, char const*, int, char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::findWithinFileFunction(char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::status(fl::filesystem::basic_path > const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Fflush at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `enc_to_utf16_n' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::~Mlm_file()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_USHORT_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `omDoesClassImplementSubsref(mxArray_tag*, bool&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aget_space at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::TypedArray, std::allocator > >::type> foundation::matlabdata::ArrayFactory::createScalar, std::allocator > >(std::__cxx11::basic_string, std::allocator >)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::dispatch_mf(int, mxArray_tag**, int, mxArray_tag**)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Pset_deflate at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::get_aspects(int&, int&, bool&, bool&, bool&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::aux::stream_provider::allocate_compound(mwboost::log::v2_mt_posix::record&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sget_simple_extent_dims at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::getExecutionStatus() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::add_rebind_event_subscriber(mwboost::signals2::slot > const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceReferenceBase::IsConvertibleTo(std::__cxx11::basic_string, std::allocator > const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utInterruptMode::utInterruptMode(InterruptMode)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::MessageCatalog::GetFormattedMessage(std::__cxx11::basic_string, std::allocator > const&, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::log::basic_diagnostic_logger::~basic_diagnostic_logger()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utMalloc' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceObjectsBase::~ServiceObjectsBase()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aclose at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::TypedArray foundation::matlabdata::ArrayFactory::createArray(std::vector >)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for cmddistributor::PromiseDelegate' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utInterruptMode::~utInterruptMode()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::isBreakPointFreezed() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utDoubleConvert' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for cmddistributor::AlreadyCancelledException' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::matlab::ServerArrayFactory::createMxArray(foundation::matlabdata::Array const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::log::basic_diagnostic_logger::basic_diagnostic_logger(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::to_string(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_string_to_ustring::do_convert[abi:cxx11](char const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::_M_construct(unsigned long, char16_t)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceObjectsBase::ServiceObjectsBase(std::shared_ptr const&, cppmicroservices::ServiceReferenceBase const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::codecvt_ustring_to_narrow_string::codecvt_ustring_to_narrow_string(bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for LoadLibraryExceptionBase' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `mfl_permute::TransposeInPlace(void*, unsigned long, unsigned long)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::log::detail::terminate(std::__cxx11::basic_ostringstream, std::allocator > const&, char const*, int, char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::MessageCatalog::get_message[abi:cxx11](resource_core::BaseMsgID const&, fl::i18n::MwLocale const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::codecvt_ustring_to_narrow_string::~codecvt_ustring_to_narrow_string()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IMsgIDException::get_error_id(std::__cxx11::basic_string, std::allocator > const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::operator[](unsigned long)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `mfl_permute::TransposeSparse(unsigned long, unsigned long, unsigned long const*, unsigned long const*, std::complex const*, unsigned long*, unsigned long*, std::complex*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::to_ustring[abi:cxx11](char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::detail::get_current_thread_data()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dread at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::copy(char16_t*, unsigned long, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::~RuntimeException()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::Mlm_file(Mdispatcher*, char16_t const*)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tget_member_type at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::IODevice::is_open() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Gcreate2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::mem::aligned_heap::instance' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_UCHAR_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utInterruptMode::on' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for MathWorks::System::Throwable_DEPRECATED' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::detail::get_tss_data(void const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::compute_license_restriction() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::ThrowInvalidIndexError() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Eget_auto2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tset_size at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::IODevice::close()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::BundleContext::GetServiceReferences(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::diag::detail::demangle_typeid_name[abi:cxx11](char const*, bool)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `fl::filesystem::codecvt_narrow_string_to_ustring::codecvt_narrow_string_to_ustring(bool)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::MwLocale::system()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utIsMultiByteLead' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::PromiseDelegate::isPromiseDone() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::ustring_to_utf8(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mvm::MVM::createRequestID()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::codecvt_ustring_to_filesystem::codecvt_ustring_to_filesystem(bool)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::diag::symbolic(std::ostream&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::IODevice::IODevice(foundation::storage::vfs::File, foundation::storage::provider::ByteStreamOpenOptions)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::Array()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ucnv_getNextUChar_69' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::find_last_not_of(char16_t const*, unsigned long, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::utf8_to_ustring[abi:cxx11](char const*)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tget_nmembers at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `vtable for foundation::core::except::IException' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_partial_name() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::thread_detail::commit_once_region(mwboost::once_flag&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::dispatch_mf_with_reuse(int, mxArray_tag**, int, mxArray_tag**)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::operator=(foundation::matlabdata::Array const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dget_space at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::AbsoluteIRI::toString[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::GetMatlabIrAccessor(Mfh_MATLAB_fn const*) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::System::Level::Level(MathWorks::System::Level::Enum)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IUserException::ToUstringImpl[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::diag::stacktrace_base::format(std::ostream&, void const*) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IException::convert_str(std::__cxx11::basic_string, std::allocator > const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::operator=(std::__cxx11::basic_string, std::allocator >&&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::setBreakPoint(int, int, char const*, bool)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_UINT_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::swap(std::__cxx11::basic_string, std::allocator >&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::matlab::ServerArrayFactory::ServerArrayFactory(foundation::matlabdata::matlab::SerializeHandles, foundation::matlabdata::ObjectFilter, foundation::matlabdata::matlab::PreferStringArray, foundation::matlabdata::matlab::OldCustomizationContext)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::add_star_based_import(char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::FEvalOutputAdapter > >::operator()(std::unique_ptr >, mlutil::cmddistributor::DataArraysDeleter>&&) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Gget_info at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::set_lookup_scope(bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::hasBreakPoint() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::this_thread::interruption_point()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_ustring_to_wstring::do_convert(wchar_t*, unsigned long, char16_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::Array(matlab::data::impl::ArrayImpl*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::dispatch_with_reuse(int, std::unique_ptr*, int, mxArray_tag**)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::RuntimeException(mlutil::cmddistributor::inExecutionStatus, std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector >, std::vector >, bool, std::vector, std::allocator > >, std::allocator, std::allocator > > > >)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::thread_detail::enter_once_region(mwboost::once_flag&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5open at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IException::convert_str(std::__cxx11::basic_string, std::allocator > const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::_M_dispose()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utFindSymbolInLibrary' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ut_new_ghash_table' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for Mlm_file' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_unicodefuncname[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Eset_auto2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dget_type at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::setErrSink(mwboost::shared_ptr)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `svIsTasksMode' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utInstallHandler' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::set_public(bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::CanBeCalledBy(Mfh_MATLAB_fn const*) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for cmddistributor::FEvalResult' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::codecvt_ustring_to_narrow_string::do_convert[abi:cxx11](char16_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::usm::Detail::getNextID(std::type_info const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::lxe::IndexInfo::MatchAndResizeIndicesToRHSDimensions()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdRequestTraits::extractResult(cmddistributor::IIPResult&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::System::Unix::SegvCleanup::~SegvCleanup()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `vtable for cmddistributor::RuntimeException' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::IODevice::write(char const*, long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::operator=(foundation::matlabdata::Array&&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utGetWarningStatus' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::TypedArray::operator[](unsigned long)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::unload()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sset_extent_simple at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utSingleConvert' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::ensure_loading_of_load_module() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::setDequeueMode(mlutil::cmddistributor::WhenToDequeue::WhenToDequeue)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IUserException::GetMessageTxt[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::I18N::UnicodeConverterX::getThreadSpecific(char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::dispatch_file_with_reuse(int, mxArray_tag**, int, mxArray_tag**)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::ForEachImport(std::function const&) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tequal at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::Resource::~Resource()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `mfl_permute::TransposeSparse(unsigned long, unsigned long, unsigned long const*, unsigned long const*, double const*, unsigned long*, unsigned long*, double*)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::filesystem::get_install_path[abi:cxx11]()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mlutil::except::IMExceptionEnvelope::~IMExceptionEnvelope()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tvlen_create at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mlutil::io::SinkTraits::~SinkTraits()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `void foundation::matlabdata::detail::setElement >(std::shared_ptr, foundation::matlabdata::TypedArray)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Gunlink at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tinsert at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `svRegisterFeature(char const*, int, void (*)(int), void (*)(int, mxArray_tag**, int, mxArray_tag**))' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utf16_strlen_lim' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::aux::stream_provider::release_compound(mwboost::log::v2_mt_posix::aux::stream_provider::stream_compound*)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_wstring_to_ustring::~codecvt_wstring_to_ustring()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::usm::Detail::findOrFail(unsigned long, std::type_info const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::Array(foundation::matlabdata::Array const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utLastLibraryError_u()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_STD_REF_OBJ_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Fopen at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::FEvalInputAdapter > >::operator()(std::vector > const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::must_retry_license() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::I18N::FromUTF16Converter::_do_conversion(char*, int) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utC2fstr' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceException::~ServiceException()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::cppmicroservices::getFramework()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for mlutil::except::INamedFuncException' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::clear_breakpoint(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::is_lookup_scope() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `typeinfo for foundation::usm::DoesNotExistTypeError' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceReferenceBase::~ServiceReferenceBase()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl_diag_terminate' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Pset_chunk at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::getDDUXFullName[abi:cxx11](Mfh_MATLAB_fn const*) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utLoadLibrary(std::__cxx11::basic_string, std::allocator > const&, int*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_ustring_to_string::~codecvt_ustring_to_string()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_ascii_to_ustring::~codecvt_ascii_to_ustring()' //usr/local/matlab/bin/glnxa64/libeng.so: undefined reference to `utCalloc' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utLoadLibraryWithLocalSymbolVisibilityAndDiagnosticMsg(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utf16_strncpy' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Gclose at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceReferenceBase::GetInterfaceId[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_complete_unicode_name[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::AbsoluteIRI::AbsoluteIRI(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_wstring_to_ustring::do_convert[abi:cxx11](wchar_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utFlipEightBytes' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_string_to_ustring::do_convert(char16_t*, unsigned long, char const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_ascii_to_ustring::do_convert[abi:cxx11](char const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Pclose at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::StorageSystem::~StorageSystem()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ucnv_reset_69' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::except::bad_alloc()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::detail::extract_extension(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::Dispatcher::AnnotateWithBuiltinName(foundation::core::except::IUserException&, std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utUnloadLibrary' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::Mfh_file(Mdispatcher*, Mlm_file*, Mfh_file*, int, int, bool, bool, bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mvm::MVM::feval(std::__cxx11::basic_string, std::allocator > const&, std::vector > const&, unsigned long, cmddistributor::RunOptions, bool, foundation::msg_svc::exchange::IDType)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::dispatch(int, mxArray_tag**, int, mxArray_tag**)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::MessageID::MessageID(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::getOwningClassName[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Acreate2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::dispatch_with_reuse(int, mxArray_tag**, int, mxArray_tag**)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sget_simple_extent_npoints at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for foundation::matlabdata::Array' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for MathWorks::System::SimpleException' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::diag::terminate(mwboost::basic_format, std::allocator > const&, char const*, int, char const*)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dset_extent at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::get_dirIRI() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_wstring_to_ustring::do_convert(char16_t*, unsigned long, wchar_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `fl::filesystem::codecvt_narrow_string_to_ustring::do_convert[abi:cxx11](char const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tcopy at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::add_import(char const*)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `mfl_permute::TransposeSparse(unsigned long, unsigned long, unsigned long const*, unsigned long const*, bool const*, unsigned long*, unsigned long*, bool*)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utFourByteIntConvert' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `fl::i18n::detail::get_module_path_from_address[abi:cxx11](void*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::~Array()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::operator==(cmddistributor::RuntimeException const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::isEmpty() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::Array(foundation::matlabdata::Array const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_C_S1_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dopen2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_ULONG_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sget_simple_extent_ndims at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aget_name at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ut_ghash_remove_element' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::usm::Detail::erase(unsigned long, std::type_info const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::getBuiltinFcn() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::diag::stacktrace_base::capture(unsigned long)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ucnv_fromUnicode_69' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::detail::make_native_form(std::__cxx11::basic_string, std::allocator > const&, bool)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aopen at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Awrite at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::MessageCatalog::get_message[abi:cxx11](fl::i18n::MessageID const&, fl::i18n::MwLocale const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mvm::detail::CmdDistributorTaskReference::CmdDistributorTaskReference(mwboost::shared_ptr, foundation::msg_svc::exchange::IDType)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for cmddistributor::CmdRequest' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::get_fullPathIRI() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::unload_mf()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Screate at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5check_version at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::IncrementSubscript()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sget_simple_extent_type at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tget_sign at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::getStackFrames() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::ArrayFactory::createCharArray(std::__cxx11::basic_string, std::allocator >)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ut_symtab_add' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `vtable for fl::i18n::codecvt_ustring_to_wstring' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::core::open_record(mwboost::log::v2_mt_posix::attribute_set const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::ThrowIndexTooLargeError() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::usm::Detail::findOrInsert(unsigned long, std::type_info const&, std::unique_ptr&&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_complete_name() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utf16_strlen' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utPrintf' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::usm::Detail::find(unsigned long, std::type_info const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::matlab::ServerArrayFactory::createArrayFromMxArray(std::unique_ptr&&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_star_based_imports() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::detail::GetDemangledName[abi:cxx11](std::type_info const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::removeListener(IMlmUnloadEventListener*)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `fl::filesystem::fopen(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::lxe::CreateDimensionString[abi:cxx11](unsigned long, unsigned long const*)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tget_member_offset at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `vtable for fl::i18n::codecvt_wstring_to_ustring' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::clearBreakPoint(int, int, char const*, bool)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::except::detail::bad_alloc_impl::internal::what() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::~Array()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utTwoByteIntConvert' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utGetMemoryContext' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::isMathWorksCode() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `typeinfo for cppmicroservices::ServiceException' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::AbsoluteIRI::~AbsoluteIRI()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdRequest::CmdRequest(cmddistributor::IqmStreamID, mwboost::shared_ptr const&, mwboost::shared_ptr const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::lcctype::isalpha_ascii(char32_t)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::mem::manager::instance()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::access(fl::filesystem::basic_path > const&, unsigned int)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::_M_replace(unsigned long, unsigned long, char16_t const*, unsigned long)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tget_class at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::Bundle::GetBundleContext() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::usm::Detail::insert(unsigned long, std::type_info const&, std::unique_ptr&&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::getOutSink() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::MessageCatalog::MessageCatalogInit(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::get_which(char16_t const*&, char16_t const*&, char16_t const*&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for mlutil::except::IMExceptionEnvelope' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::set_breakpoint(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_imports(bool) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for foundation::core::except::IMsgIDException' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utVSNprintf' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::core::get_logging_enabled() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Rcreate at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Pcreate at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::setQueueName(mwboost::optional, std::allocator > >)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::load_mf()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::clearAllBreakPoints()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::PromiseDelegate::waitTillReady() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mvm::Factory::getCurrentLocalMVM()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::detail::addIndex(std::shared_ptr&, std::__cxx11::basic_string, std::allocator >)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tlock at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::TypedArray::type> foundation::matlabdata::ArrayFactory::createScalar(double)' //usr/local/matlab/bin/glnxa64/libeng.so: undefined reference to `sendEngineDduxInfo' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_SCHAR_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ut_ghash_insert' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dwrite at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::usm::Detail::findOrThrow(unsigned long, std::type_info const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aget_num_attrs at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::notify_rebind_event_subscribers()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for foundation::core::except::IUserException' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::PromiseDelegate::PromiseDelegate()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::GetID[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::detail::validateInput(std::vector > const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Lexists at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::lxe::IndexInfo::IndexInfo(unsigned long, bool)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ut_delete_ghash_table' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_wstring_to_ustring::do_get_destination_size(wchar_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aget_type at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::I18N::FromUTF16Converter::FromUTF16Converter(char16_t const*, int, char const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::narginout(int*, int*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `services::state::RevertImpersonationGuard::~RevertImpersonationGuard()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdRequest::getOutputSink() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::TypedArray::type> foundation::matlabdata::ArrayFactory::createScalar(unsigned long)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::codecvt_ustring_to_filesystem::do_convert[abi:cxx11](char16_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::System::Unix::SegvCleanup::continuable()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `vtable for MathWorks::System::SimpleException' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::erase_imports()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::future_category()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::set_typeidx(int)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::thread_detail::rollback_once_region(mwboost::once_flag&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dvlen_reclaim at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::Array(foundation::matlabdata::Array&&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_ustring_to_string::do_convert[abi:cxx11](char16_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sselect_hyperslab at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_LONG_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Gget_objname_by_idx at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::getCauses() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utSetLastPrintedWarningId' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Gget_objinfo at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::setOutSink(mwboost::shared_ptr)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::get_hidden(int) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::utility::normalizeURL(std::__cxx11::basic_string, std::allocator >&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::detail::set_tss_data(void const*, void (*)(void (*)(void*), void*), void (*)(void*), void*, bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::FEvalIIP::FEvalIIP(mwboost::variant, std::allocator >, std::shared_ptr >, unsigned long, mwboost::shared_ptr, cmddistributor::IqmStreamID, mwboost::shared_ptr const&, mwboost::shared_ptr const&, mlutil::contextmgr::MvmID const&, mlutil::contextmgr::MvmID const&, bool, std::__cxx11::basic_string, std::allocator >)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::I18N::cast(MathWorks::I18N::UnicodeConverterX*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IException::what() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::get_private(int)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `u_strFromUTF8_69' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::File::fileExists() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_ustring_to_wstring::do_convert[abi:cxx11](char16_t const*, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::file_status::file_not_found' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IUserException::convert_str(std::__cxx11::basic_string, std::allocator > const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::FEvalResult::FEvalResult(foundation::msg_svc::exchange::IDType const&, std::vector >&&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::I18N::FromUTF16Converter::copy_to(std::__cxx11::basic_string, std::allocator >&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utIsValidMessageIdentifier' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utFlipFourBytes' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::except::IException::ToUstringImpl[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for Mfh_file' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `util::LibSyms::LibSyms(std::__cxx11::basic_string, std::allocator > const&, std::vector, std::allocator >, std::allocator, std::allocator > > >&, std::vector, std::allocator >, std::allocator, std::allocator > > >&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tget_member_name at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_INT_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::_M_create(unsigned long&, unsigned long)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::isMathWorksCode() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `void foundation::matlabdata::detail::setElement(std::shared_ptr, foundation::matlabdata::Array)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `fl::filesystem::codecvt_narrow_string_to_ustring::~codecvt_narrow_string_to_ustring()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdRequest::~CmdRequest()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::dispatch(int, std::unique_ptr*, int, mxArray_tag**)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::lcctype::isalnum_ascii(char32_t)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::System::Unix::SegvCleanup::SegvCleanup()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_string_to_ustring::~codecvt_string_to_ustring()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `fl::filesystem::freopen(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, _IO_FILE*)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::I18N::FromUTF16Converter::~FromUTF16Converter()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utTmpnam' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::lxe::IndexInfo::MatchIndicesToRHSDimensions() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceObjectsBase::GetService() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utPrintfIsHot()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ucnv_toUChars_69' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `vtable for foundation::storage::vfs::File' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::LDAPFilter::~LDAPFilter()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::filesystem::codecvt_ustring_to_filesystem::~codecvt_ustring_to_filesystem()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `omDoesClassImplementSubsasgn(mxArray_tag*, bool&, bool&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::EvalCmdRequest::EvalCmdRequest(bool, cmddistributor::IqmStreamID, mwboost::shared_ptr const&, mwboost::shared_ptr const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator >)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::mem::get_total_random_access_memory()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ucnv_fromUChars_69' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::createSink(mwboost::shared_ptr > >, mlutil::io::SinkTraits)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::AbsoluteIRI::getScheme[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sselect_elements at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `vtable for foundation::storage::vfs::StorageSystemReference' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utIsWarningModeOn' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::File::createFile() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::try_load()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::getErrSink() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IMsgIDException::GetUnicodeMessageTxt[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mdFunctionCallProtectedDispatcher::~mdFunctionCallProtectedDispatcher()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `foundation::core::except::detail::legacy(foundation::core::except::detail::bad_alloc_impl const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `services::state::RevertImpersonationGuard::RevertImpersonationGuard()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `void foundation::matlabdata::detail::setElement >(std::shared_ptr, foundation::matlabdata::TypedArray)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `vtable for fl::i18n::codecvt_ascii_to_ustring' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ut_ghash_replace' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RunOptions::RunOptions(mwboost::shared_ptr > >, mwboost::shared_ptr > >, mlutil::io::SinkTraits, mlutil::io::SinkTraits)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_ustring_to_string::codecvt_ustring_to_string(std::__cxx11::basic_string, std::allocator > const&, bool)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utLogWarning(utWarningManagerContext*, mwboost::optional, std::allocator > > const&, resource_core::BaseMsgID const&, bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::IIPResult::getExecutionStatus() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::addListener(IMlmUnloadEventListener*)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::~BadSubscriptHandler()' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utGetWarningManagerContextSafe' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::core::push_record_move(mwboost::log::v2_mt_posix::record&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::getFunctionName[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Rdereference at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdRequest::getErrorSink() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_partial_unicode_name[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `fl::i18n::detail::compare(char16_t const*, unsigned long, char16_t const*, unsigned long, fl::i18n::case_sensitive const&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_SHORT_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::InitializeIndexExceedsBoundsHandler(unsigned long, unsigned long, unsigned long const*, unsigned long)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdRequest::getIIP() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::usm::Detail::isRequiredContextActive()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::hasIRIPrefix(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `generateDiagnosticMsg[abi:cxx11](LoadLibraryExceptionBase const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::ArrayFactory::ArrayFactory()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mlutil::cmddistributor::DataArraysDeleter::operator()(std::vector >*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mdFunctionCallProtectedDispatcher::mdFunctionCallProtectedDispatcher(Mfh_MATLAB_fn*, bool, Mdispatcher*, bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utInterruptState::ThrowIfInterruptPending()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::IODevice::read(char*, long) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5P_CLS_FILE_CREATE_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::sources::aux::get_severity_level()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::IODevice::IODevice(foundation::storage::vfs::File)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `mfl_permute::Transpose(void const*, void*, unsigned long, unsigned long, unsigned long)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Fcreate at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::get_privates(int, std::vector >&, std::vector >&) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tclose at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aread at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Lmove at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::add_clear_event_subscriber(mwboost::signals2::slot > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::get_license_restriction() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::matlab::ServerArrayFactory::releaseMxArray(foundation::matlabdata::Array&&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Pset_userblock at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceException::ServiceException(std::__cxx11::basic_string, std::allocator > const&, cppmicroservices::ServiceException::Type const&)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::ThrowIndexExceedsDimensionsError(MathWorks::indexing::utilities::IndicesType) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5P_CLS_DATASET_CREATE_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IUserException::convert_str(std::__cxx11::basic_string, std::allocator > const&) const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `vtable for foundation::core::except::detail::bad_alloc_impl::internal' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Scopy at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::record_view::public_data::destroy(mwboost::log::v2_mt_posix::record_view::public_data const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_MATLAB_fn_impl::load()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::Array::getType() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `fl::i18n::codecvt_string_to_ustring::codecvt_string_to_ustring(std::__cxx11::basic_string, std::allocator > const&, bool)' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utf16_to_lcp_n' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceReferenceBase::ServiceReferenceBase(cppmicroservices::ServiceReferenceBase const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::GetUserVisibleFullFunctionName[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `foundation::storage::vfs::File::~File()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::find(char16_t const*, unsigned long, unsigned long) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_hfuncname() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::_M_assign(std::__cxx11::basic_string, std::allocator > const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::getBreakpointCondition() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::PromiseDelegate::~PromiseDelegate()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Screate_simple at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdRequest::getStreamID() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Rget_obj_type2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tcreate at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::ServiceReferenceBase::operator=(decltype(nullptr))' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::PromiseDelegate::tryRunTask()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::report_breakpoints(std::__cxx11::basic_string, std::allocator > const&, std::vector, std::allocator > >&)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Aopen_by_idx at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `void foundation::matlabdata::detail::setElement(std::shared_ptr, foundation::matlabdata::CharArray)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `void foundation::matlabdata::detail::setElement >(std::shared_ptr, foundation::matlabdata::TypedArray)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_UINT64_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for cmddistributor::RuntimeException' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::CvtExceptionBase::~CvtExceptionBase()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utGetFormatEndian' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::IsThrowIndexExceedsDimension() const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dclose at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `std::__cxx11::basic_string, std::allocator >::_M_append(char16_t const*, unsigned long)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::LDAPFilter::ToString[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::aux::code_convert_impl(char16_t const*, unsigned long, std::__cxx11::basic_string, std::allocator >&, unsigned long, std::locale const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::getFunctionInLexicalScope(MathWorks::Dispatcher::FunctionName const&) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Sclose at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utStrdup' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `utPutFourByteInt' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Tget_size at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Pset_fill_value at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libeng.so: undefined reference to `utF2cstr' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utLoadLibraryWithLocalSymbolVisibility(std::__cxx11::basic_string, std::allocator > const&, int*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utIsSignalThread' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::RuntimeException::GetUnicodeMessageTxt[abi:cxx11]() const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `utVprintf' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::lxe::IndexInfo::IndexInfo(unsigned long, MathWorks::lxe::RHSInfo const&, bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::basic_record_ostream::detach_from_record()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Fclose at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `vtable for mlutil::io::SinkTraits' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utLogWarning(utWarningManagerContext*, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, __va_list_tag*, std::__cxx11::basic_string, std::allocator > const&, bool)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::TypedArray::type> foundation::matlabdata::ArrayFactory::createScalar(int)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::ArrayFactory::createStructArray(std::vector >, std::vector, std::allocator >, std::allocator, std::allocator > > >)' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `cppmicroservices::LDAPFilter::LDAPFilter()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_FLOAT_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for foundation::core::except::IException' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Pset_layout at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `typeinfo for foundation::storage::vfs::InvalidIRIException' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `ut_ghash_query' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utFinite' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mlm_file::checkout_license()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for cmddistributor::IIPResult' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `cmddistributor::CmdDistributor::RequestTrackingDeleter::operator()(cmddistributor::CmdDistributor::RequestTracking*) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Dcreate2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::get_hiddens(int, std::vector >&, std::vector >&) const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `mfl_permute::Permute(void const*, void*, unsigned long, unsigned long const*, unsigned long const*, unsigned long, long const*, long const*)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::core::except::IMsgIDException::get_error_id[abi:cxx11](resource_core::BaseMsgID const&) const' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `typeinfo for Mfh_MATLAB_fn' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `vtable for cmddistributor::EvalCmdRequest' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `utIsMultiByteCharSet' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `fl::i18n::codecvt_ustring_to_wstring::~codecvt_ustring_to_wstring()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::filterSyntheticName(std::__cxx11::basic_string, std::allocator >&) const' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5T_NATIVE_INT64_g at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_file::~Mfh_file()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `Mfh_MATLAB_fn_impl::get_complete_external_name() const' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `MathWorks::indexing::utilities::BadSubscriptHandler::InitializeInvalidIndexHandler(unsigned long, unsigned long)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `MathWorks::System::report(MathWorks::System::Externalizable const&, MathWorks::System::Level const&)' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `foundation::matlabdata::matlab::ServerArrayFactory::~ServerArrayFactory()' //usr/local/matlab/bin/glnxa64/libmex.so: undefined reference to `mwboost::log::v2_mt_posix::basic_record_ostream::init_stream()' //usr/local/matlab/bin/glnxa64/libmat.so: undefined reference to `H5Gopen2 at MWHDF51.8.12' //usr/local/matlab/bin/glnxa64/libmx.so: undefined reference to `typeinfo for util::detail::legacy_bad_alloc_impl' collect2: error: ld returned 1 exit status -------------- next part -------------- #include int main(int argc, char* argv[]) { PetscInitialize(&argc, &argv, NULL, NULL); PetscPrintf(PETSC_COMM_WORLD, "Hello, PETSc!\n"); PetscFinalize(); return 0; } From knepley at gmail.com Wed Aug 2 09:16:16 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 2 Aug 2023 10:16:16 -0400 Subject: [petsc-users] compiler related error (configuring Petsc) In-Reply-To: References: Message-ID: On Wed, Aug 2, 2023 at 9:32?AM maitri ksh wrote: > I could compile petsc using the newer version of gnu compiler. > However, there is some unusual error when I tried to test petsc using a > print 'hello' file. I could not interpret what the error ('*error.txt*') > is, but it says something related to MATLAB (which is not used in the ' > *hello.c*' script). Any comments/suggestions? > 1) Did you link PETSc against MATLAB? If not, did you link your test against MATLAB? 2) If you linked PETSc against it, likely you were in a shell with LD_LIBRARY_PATH defined so that MATLAB would run. Without that, it cannot find the dynamic libraries it needs. Thanks, Matt > On Wed, Aug 2, 2023 at 11:07?AM maitri ksh wrote: > >> Okay, thank you. >> >> On Tue, Aug 1, 2023 at 9:43?PM Satish Balay wrote: >> >>> > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) >>> >>> Is it possible for you to use a newer version GNU compilers? >>> >>> If not - your alternative is to build PETSc with --with-cxx=0 option >>> >>> But then - you can't use --download-superlu_dist or any pkgs that need >>> c++ [you could try building them separately though] >>> >>> Satish >>> >>> >>> On Tue, 1 Aug 2023, maitri ksh wrote: >>> >>> > I am trying to compile petsc on a cluster ( x86_64-redhat-linux, ' >>> > *configure.log'* is attached herewith) . Initially I got an error >>> related >>> > to 'C++11' flag, to troubleshoot this issue, I used 'CPPFLAGS' and >>> > 'CXXFLAGS' and could surpass the non-compliant error related to c++ >>> compiler >>> > but now it gives me another error 'cannot find a C preprocessor'. How >>> to >>> > fix this? >>> > >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Aug 3 22:13:49 2023 From: jed at jedbrown.org (Jed Brown) Date: Thu, 03 Aug 2023 20:13:49 -0700 Subject: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper In-Reply-To: References: Message-ID: <87msz7xzwy.fsf@jedbrown.org> I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to modify the initial guess for SNES. Would that serve your needs? Is there anything else you can say about how you'd like to compute this initial guess? Is there a paper or something? David Kamensky writes: > Hi, > > My understanding is that the second-order generalized-alpha time stepper in > PETSc uses a same-displacement predictor as the initial guess for the > nonlinear solver that executes in each time step. I'd like to be able to > set this to something else, to improve convergence. However, my > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` haven't > worked out. Is there any way to set a custom predictor? > > Thanks, > David Kamensky From bldenton at buffalo.edu Fri Aug 4 09:31:50 2023 From: bldenton at buffalo.edu (Brandon Denton) Date: Fri, 4 Aug 2023 14:31:50 +0000 Subject: [petsc-users] PETSc :: FEM Help In-Reply-To: References: Message-ID: Good Morning Prof. Knepley, Thank you for the update. I am now able to run the code. However, it does not appear to solve the problem correctly. The only results available are the initial conditions (temp = 100). In the problem, one face is set to 1400 and another face is set to 100. Since the faces are at opposite ends of the geometry, we would expect a roughly linear temperature profile from 1400 to 100. What am I missing to get the output to show this proper result. Thank you. Brandon ________________________________ From: Matthew Knepley Sent: Tuesday, August 1, 2023 10:23 AM To: Brandon Denton Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PETSc :: FEM Help Sorry about this. I signed up for a conference without the work done, with predictable results. I have just returned home. There were just a few small problems. First, the labels were attached to dmSurface, but you wanted them on dm. They got destroyed with dmSurface before setting the BC. Second, the declarations of the point function were missing the constant arguments. Third, the PetscFEDestroy() was missing and extra DM creations were there. I have fixed these and am attaching the new source. It runs for me but I have not checked the answer. Thanks, Matt On Wed, Jun 7, 2023 at 11:05?AM Brandon Denton via petsc-users > wrote: Good Morning, I'm trying to verify that the CAD -> PETSc/DMPlex methods I've developed can be used for FEM analyses using PETSc. Attached is my current attempt where I import a CAD STEP file to create a volumetric tetrahedral discretization (DMPlex), designate boundary condition points using DMLabels, and solve the Laplace problem (heat) with Dirichlet conditions on each end. At command line I indicate the STEP file with the -filename option and the dual space degree with -petscspace_degree 2. The run ends with either a SEGV Fault or a General MPI Communication Error. Could you please look over the attached file to tell me if what I'm doing to set up the FEM problem is wrong? Thank you in advance for your time and help. -Brandon TYPICAL ERROR MESSAGE [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: General MPI error [0]PETSC ERROR: MPI error 605109765 Invalid communicator, error stack: PMPI_Comm_get_attr(344): MPI_Comm_get_attr(comm=0x0, comm_keyval=-1539309568, attribute_val=0x7ffe75a58848, flag=0x7ffe75a58844) failed MPII_Comm_get_attr(257): MPIR_Comm_get_attr(comm=0x0, comm_keyval=-1539309568, attribute_val=0x7ffe75a58848, flag=0x7ffe75a58844) failed MPII_Comm_get_attr(53).: Invalid communicator [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-dm_plex_refine_without_snap_to_geom value: 0 source: command line [0]PETSC ERROR: Option left: name:-dm_refine value: 1 source: command line [0]PETSC ERROR: Option left: name:-snes_monitor (no value) source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.18.5-1817-gd2497b8de4c GIT Date: 2023-05-22 18:44:03 +0000 [0]PETSC ERROR: ./thermal on a named XPS. by bdenton Wed Jun 7 11:03:43 2023 [0]PETSC ERROR: Configure options --with-make-np=16 --prefix=/mnt/c/Users/Brandon/software/libs/petsc/3.19.1-gitlab/gcc/11.2.0/mpich/3.4.2/openblas/0.3.17/opt --with-debugging=false --COPTFLAGS="-O3 -mavx" --CXXOPTFLAGS="-O3 -mavx" --FOPTFLAGS=-O3 --with-shared-libraries=1 --with-mpi-dir=/mnt/c/Users/Brandon/software/libs/mpich/3.4.2/gcc/11.2.0 --with-mumps=true --download-mumps=1 --with-metis=true --download-metis=1 --with-parmetis=true --download-parmetis=1 --with-superlu=true --download-superlu=1 --with-superludir=true --download-superlu_dist=1 --with-blacs=true --download-blacs=1 --with-scalapack=true --download-scalapack=1 --with-hypre=true --download-hypre=1 --with-hdf5-dir=/mnt/c/Users/Brandon/software/libs/hdf5/1.12.1/gcc/11.2.0 --with-valgrind-dir=/mnt/c/Users/Brandon/software/apps/valgrind/3.14.0 --with-blas-lib="[/mnt/c/Users/Brandon/software/libs/openblas/0.3.17/gcc/11.2.0/lib/libopenblas.so]" --with-lapack-lib="[/mnt/c/Users/Brandon/software/libs/openblas/0.3.17/gcc/11.2.0/lib/libopenblas.so]" --LDFLAGS= --with-tetgen=true --download-tetgen=1 --download-ctetgen=1 --download-opencascade=1 --download-egads [0]PETSC ERROR: #1 PetscObjectName() at /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/sys/objects/pname.c:119 [0]PETSC ERROR: #2 PetscObjectGetName() at /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/sys/objects/pgname.c:27 [0]PETSC ERROR: #3 PetscDSAddBoundary() at /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/dm/dt/interface/dtds.c:3404 [0]PETSC ERROR: #4 DMAddBoundary() at /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/dm/interface/dm.c:7828 [0]PETSC ERROR: #5 main() at /mnt/c/Users/Brandon/Documents/School/Dissertation/Software/EGADS-dev/thermal_v319/thermal_nozzle.c:173 [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -dm_plex_geom_print_model 1 (source: command line) [0]PETSC ERROR: -dm_plex_geom_shape_opt 0 (source: command line) [0]PETSC ERROR: -dm_plex_refine_without_snap_to_geom 0 (source: command line) [0]PETSC ERROR: -dm_refine 1 (source: command line) [0]PETSC ERROR: -filename ./examples/Nozzle_example.stp (source: command line) [0]PETSC ERROR: -petscspace_degree 2 (source: command line) [0]PETSC ERROR: -snes_monitor (source: command line) [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_SELF, 98) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=98 : system msg for write_line failure : Bad file descriptor -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at coreform.com Fri Aug 4 09:50:14 2023 From: david at coreform.com (David Kamensky) Date: Fri, 4 Aug 2023 07:50:14 -0700 Subject: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper In-Reply-To: <87msz7xzwy.fsf@jedbrown.org> References: <87msz7xzwy.fsf@jedbrown.org> Message-ID: Hi Jed, What I'm trying to compute is basically a standard same-velocity or same-acceleration predictor (although slightly more complicated, since I'm restricting it to a sub-system). I hadn't looked into `SNESSetComputeInitialGuess` yet, although one difficulty is that it would need access to the `X0`, `V0`, and `A0` members of the `TS_Alpha` struct, which is only defined in `alpha2.c`, and thus not available through the API. For now, we just worked around this by patching PETSc to move the definition of `TS_Alpha` up into a header to make it accessible. (Modifying the library obviously introduces a maintenance headache; I also considered just casting the `ts->data` pointer to `(char*)`, calculating memory offsets based on `sizeof` the struct members, and casting back to `Vec`, but that relies on compiler-specific assumptions, and could also break if the PETSc source code was updated.) We also shuffled the order of some calls to `VecCopy` and `TSPreStage` in the routine `TSAlpha_Restart`, so that `TSPreStage` can set the initial guess, although that sounds like it would be unnecessary if we instead used a callback in `SNESSetComputeInitialGuess` that had access to the internals of `TS_Alpha`. Thanks, David On Thu, Aug 3, 2023 at 11:28?PM Jed Brown wrote: > I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to modify > the initial guess for SNES. Would that serve your needs? Is there anything > else you can say about how you'd like to compute this initial guess? Is > there a paper or something? > > David Kamensky writes: > > > Hi, > > > > My understanding is that the second-order generalized-alpha time stepper > in > > PETSc uses a same-displacement predictor as the initial guess for the > > nonlinear solver that executes in each time step. I'd like to be able to > > set this to something else, to improve convergence. However, my > > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` > haven't > > worked out. Is there any way to set a custom predictor? > > > > Thanks, > > David Kamensky > -------------- next part -------------- An HTML attachment was scrubbed... URL: From onur.notonur at proton.me Fri Aug 4 11:05:48 2023 From: onur.notonur at proton.me (onur.notonur) Date: Fri, 04 Aug 2023 16:05:48 +0000 Subject: [petsc-users] DMPlex edge/vertex orientation Message-ID: Hi, I'm currently working with 3D DMPlex and performing crucial calculations involving face normals and edge tangents. I've noticed that face normals are directed from support[0] to support[1]. However, I'm uncertain about the conventions for edges and vertices in relation to faces. Specifically, I need to determine the order of vertices that create a surface and whether they are stored in a counter-clockwise (CCW) or clockwise (CW) manner. As DMPlex follows a hierarchy of cell-face-edge-vertex, my main question becomes about the orientation of edges. Any clarification on this aspect would be immensely helpful! Additionally, I'm unfamiliar with most of the terms used in DMPlex. For example "orientation" in DMPlexGetConeOrientation. If you could suggest some readings or resources that explain these concepts, I would greatly appreciate it. Thx, Onur Sent with [Proton Mail](https://proton.me/) secure email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Aug 4 11:23:34 2023 From: jed at jedbrown.org (Jed Brown) Date: Fri, 04 Aug 2023 10:23:34 -0600 Subject: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper In-Reply-To: References: <87msz7xzwy.fsf@jedbrown.org> Message-ID: <87wmyawzcp.fsf@jedbrown.org> Some other TS implementations have a concept of extrapolation as an initial guess. Such method-specific initial guesses sound like they fit that pattern and would be welcome to be included in alpha2.c. Would you be willing to make a merge request to bring your work upstream? David Kamensky writes: > Hi Jed, > > What I'm trying to compute is basically a standard same-velocity or > same-acceleration predictor (although slightly more complicated, since I'm > restricting it to a sub-system). I hadn't looked into > `SNESSetComputeInitialGuess` yet, although one difficulty is that it would > need access to the `X0`, `V0`, and `A0` members of the `TS_Alpha` struct, > which is only defined in `alpha2.c`, and thus not available through the > API. > > For now, we just worked around this by patching PETSc to move the > definition of `TS_Alpha` up into a header to make it accessible. > (Modifying the library obviously introduces a maintenance headache; I also > considered just casting the `ts->data` pointer to `(char*)`, calculating > memory offsets based on `sizeof` the struct members, and casting back to > `Vec`, but that relies on compiler-specific assumptions, and could also > break if the PETSc source code was updated.) We also shuffled the order of > some calls to `VecCopy` and `TSPreStage` in the routine `TSAlpha_Restart`, > so that `TSPreStage` can set the initial guess, although that sounds like > it would be unnecessary if we instead used a callback in > `SNESSetComputeInitialGuess` that had access to the internals of > `TS_Alpha`. > > Thanks, David > > On Thu, Aug 3, 2023 at 11:28?PM Jed Brown wrote: > >> I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to modify >> the initial guess for SNES. Would that serve your needs? Is there anything >> else you can say about how you'd like to compute this initial guess? Is >> there a paper or something? >> >> David Kamensky writes: >> >> > Hi, >> > >> > My understanding is that the second-order generalized-alpha time stepper >> in >> > PETSc uses a same-displacement predictor as the initial guess for the >> > nonlinear solver that executes in each time step. I'd like to be able to >> > set this to something else, to improve convergence. However, my >> > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` >> haven't >> > worked out. Is there any way to set a custom predictor? >> > >> > Thanks, >> > David Kamensky >> From knepley at gmail.com Fri Aug 4 11:49:37 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 4 Aug 2023 12:49:37 -0400 Subject: [petsc-users] DMPlex edge/vertex orientation In-Reply-To: References: Message-ID: On Fri, Aug 4, 2023 at 12:06?PM onur.notonur via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I'm currently working with 3D DMPlex and performing crucial calculations > involving face normals and edge tangents. I've noticed that face normals > are directed from support[0] to support[1]. > That is an accident of implementation and not enforced. > However, I'm uncertain about the conventions for edges and vertices in > relation to faces. Specifically, I need to determine the order of vertices > that create a surface and whether they are stored in a counter-clockwise > (CCW) or clockwise (CW) manner. As DMPlex follows a hierarchy of > cell-face-edge-vertex, my main question becomes about the orientation of > edges. Any clarification on this aspect would be immensely helpful! > 1) All computed quantities follow the closure ordering, namely that the order that vertices come out in the DMPlexGetTransitiveClosure() call is the one used for computing. 2) Closures are always ordered to produce outward normals 3) Since we build k-cells out of k-1 cells, the k-1 cells _already_ have an ordering before I make my k-cell. Thus I have to tell you how to order the k-1 cell, with respect to its closure ordering, when you are building your k-cell. This is what an "orientation" is, namely a representation of the dihedral group for that k-1 cell. Example: A segment has two orientations, which we label 0 and -1. When we build a triangle out of segments, we order them counter-clockwise, so that the normals are all outward. The same thing is done for quads. Triangles have 6 orientations, all the permutations of the edges. We pick one when making tetrahedra such that the normals are outward _and_ the vertices are in the closure order. > Additionally, I'm unfamiliar with most of the terms used in DMPlex. For > example "orientation" in DMPlexGetConeOrientation. If you could suggest > some readings or resources that explain these concepts, I would greatly > appreciate it. > I am finishing up my book on it, which I will post. To start, here is a paper https://arxiv.org/abs/2004.08729 Thanks, Matt > Thx, > Onur > Sent with Proton Mail secure email. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 4 11:51:41 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 4 Aug 2023 12:51:41 -0400 Subject: [petsc-users] PETSc :: FEM Help In-Reply-To: References: Message-ID: On Fri, Aug 4, 2023 at 10:31?AM Brandon Denton wrote: > Good Morning Prof. Knepley, > > Thank you for the update. I am now able to run the code. However, it does > not appear to solve the problem correctly. The only results available are > the initial conditions (temp = 100). In the problem, one face is set to > 1400 and another face is set to 100. Since the faces are at opposite ends > of the geometry, we would expect a roughly linear temperature profile from > 1400 to 100. What am I missing to get the output to show this proper result. > 1) The inlet/outlet labels where being constructed on dmSurface. I fixed this. 2) The 14/7 faceIDs do not appear to be the ones you want. Here is corrected source. Can you look at the labels? Thanks, Matt > Thank you. > Brandon > > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Tuesday, August 1, 2023 10:23 AM > *To:* Brandon Denton > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] PETSc :: FEM Help > > Sorry about this. I signed up for a conference without the work done, with > predictable results. I have just returned home. > > There were just a few small problems. First, the labels were attached to > dmSurface, but you wanted them on dm. They got destroyed with dmSurface > before setting the BC. Second, the declarations of the point function were > missing the constant arguments. Third, the PetscFEDestroy() was missing and > extra DM creations were there. I have fixed these and am attaching the new > source. It runs for me but I have not checked the answer. > > Thanks, > > Matt > > On Wed, Jun 7, 2023 at 11:05?AM Brandon Denton via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Good Morning, > > I'm trying to verify that the CAD -> PETSc/DMPlex methods I've developed > can be used for FEM analyses using PETSc. Attached is my current attempt > where I import a CAD STEP file to create a volumetric tetrahedral > discretization (DMPlex), designate boundary condition points using > DMLabels, and solve the Laplace problem (heat) with Dirichlet conditions on > each end. At command line I indicate the STEP file with the -filename > option and the dual space degree with -petscspace_degree 2. The run ends > with either a SEGV Fault or a General MPI Communication Error. > > Could you please look over the attached file to tell me if what I'm doing > to set up the FEM problem is wrong? > > Thank you in advance for your time and help. > -Brandon > > TYPICAL ERROR MESSAGE > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: General MPI error > [0]PETSC ERROR: MPI error 605109765 Invalid communicator, error stack: > PMPI_Comm_get_attr(344): MPI_Comm_get_attr(comm=0x0, > comm_keyval=-1539309568, attribute_val=0x7ffe75a58848, flag=0x7ffe75a58844) > failed > MPII_Comm_get_attr(257): MPIR_Comm_get_attr(comm=0x0, > comm_keyval=-1539309568, attribute_val=0x7ffe75a58848, flag=0x7ffe75a58844) > failed > MPII_Comm_get_attr(53).: Invalid communicator > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-dm_plex_refine_without_snap_to_geom > value: 0 source: command line > [0]PETSC ERROR: Option left: name:-dm_refine value: 1 source: command > line > [0]PETSC ERROR: Option left: name:-snes_monitor (no value) source: > command line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.5-1817-gd2497b8de4c > GIT Date: 2023-05-22 18:44:03 +0000 > [0]PETSC ERROR: ./thermal on a named XPS. by bdenton Wed Jun 7 11:03:43 > 2023 > [0]PETSC ERROR: Configure options --with-make-np=16 > --prefix=/mnt/c/Users/Brandon/software/libs/petsc/3.19.1-gitlab/gcc/11.2.0/mpich/3.4.2/openblas/0.3.17/opt > --with-debugging=false --COPTFLAGS="-O3 -mavx" --CXXOPTFLAGS="-O3 -mavx" > --FOPTFLAGS=-O3 --with-shared-libraries=1 > --with-mpi-dir=/mnt/c/Users/Brandon/software/libs/mpich/3.4.2/gcc/11.2.0 > --with-mumps=true --download-mumps=1 --with-metis=true --download-metis=1 > --with-parmetis=true --download-parmetis=1 --with-superlu=true > --download-superlu=1 --with-superludir=true --download-superlu_dist=1 > --with-blacs=true --download-blacs=1 --with-scalapack=true > --download-scalapack=1 --with-hypre=true --download-hypre=1 > --with-hdf5-dir=/mnt/c/Users/Brandon/software/libs/hdf5/1.12.1/gcc/11.2.0 > --with-valgrind-dir=/mnt/c/Users/Brandon/software/apps/valgrind/3.14.0 > --with-blas-lib="[/mnt/c/Users/Brandon/software/libs/openblas/0.3.17/gcc/11.2.0/lib/libopenblas.so]" > --with-lapack-lib="[/mnt/c/Users/Brandon/software/libs/openblas/0.3.17/gcc/11.2.0/lib/libopenblas.so]" > --LDFLAGS= --with-tetgen=true --download-tetgen=1 --download-ctetgen=1 > --download-opencascade=1 --download-egads > [0]PETSC ERROR: #1 PetscObjectName() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/sys/objects/pname.c:119 > [0]PETSC ERROR: #2 PetscObjectGetName() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/sys/objects/pgname.c:27 > [0]PETSC ERROR: #3 PetscDSAddBoundary() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/dm/dt/interface/dtds.c:3404 > [0]PETSC ERROR: #4 DMAddBoundary() at > /mnt/c/Users/Brandon/software/builddir/petsc-3.19.1-gitlab/src/dm/interface/dm.c:7828 > [0]PETSC ERROR: #5 main() at > /mnt/c/Users/Brandon/Documents/School/Dissertation/Software/EGADS-dev/thermal_v319/thermal_nozzle.c:173 > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -dm_plex_geom_print_model 1 (source: command line) > [0]PETSC ERROR: -dm_plex_geom_shape_opt 0 (source: command line) > [0]PETSC ERROR: -dm_plex_refine_without_snap_to_geom 0 (source: command > line) > [0]PETSC ERROR: -dm_refine 1 (source: command line) > [0]PETSC ERROR: -filename ./examples/Nozzle_example.stp (source: command > line) > [0]PETSC ERROR: -petscspace_degree 2 (source: command line) > [0]PETSC ERROR: -snes_monitor (source: command line) > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 98) - process 0 > [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=98 > : > system msg for write_line failure : Bad file descriptor > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: thermal_nozzle_new.c Type: application/octet-stream Size: 9548 bytes Desc: not available URL: From david at coreform.com Fri Aug 4 12:11:19 2023 From: david at coreform.com (David Kamensky) Date: Fri, 4 Aug 2023 10:11:19 -0700 Subject: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper In-Reply-To: <87wmyawzcp.fsf@jedbrown.org> References: <87msz7xzwy.fsf@jedbrown.org> <87wmyawzcp.fsf@jedbrown.org> Message-ID: Hi Jed, The current workaround I'm using is very minimal and basically just moves the definition of `TS_Alpha` from `alpha2.c` up to `petsc/private/tsimpl.h` (and renames it to avoid a conflict with `TS_Alpha` in `alpha1.c`), but I gather that we're really still not "supposed to" include that header in applications. So, I don't know whether something like that would be welcomed upstream. The actual computation of the predictor is done on the application side. Having options like `-ts_alpha_same_velocity` and `-ts_alpha_same_acceleration` could probably be implemented by analogy to `-ts_theta_initial_guess_extrapolate`, although they wouldn't quite cover my specific use-case, where I'm only setting the predictor on part of the solution vector. So, maybe something more general, like providing a generalized-$\alpha$-specific option for a custom predictor callback that takes `X0`, `V0`, and `A0` arguments would be the cleanest solution (and some convenient shortcuts for full-solution same-velocity and same-acceleration predictors could subsequently make use of that infrastructure). I've been working quickly over the past week, but I might be able to take some time to implement a more sustainable solution soon. Thanks again, David On Fri, Aug 4, 2023 at 9:23?AM Jed Brown wrote: > Some other TS implementations have a concept of extrapolation as an > initial guess. Such method-specific initial guesses sound like they fit > that pattern and would be welcome to be included in alpha2.c. Would you be > willing to make a merge request to bring your work upstream? > > David Kamensky writes: > > > Hi Jed, > > > > What I'm trying to compute is basically a standard same-velocity or > > same-acceleration predictor (although slightly more complicated, since > I'm > > restricting it to a sub-system). I hadn't looked into > > `SNESSetComputeInitialGuess` yet, although one difficulty is that it > would > > need access to the `X0`, `V0`, and `A0` members of the `TS_Alpha` struct, > > which is only defined in `alpha2.c`, and thus not available through the > > API. > > > > For now, we just worked around this by patching PETSc to move the > > definition of `TS_Alpha` up into a header to make it accessible. > > (Modifying the library obviously introduces a maintenance headache; I > also > > considered just casting the `ts->data` pointer to `(char*)`, calculating > > memory offsets based on `sizeof` the struct members, and casting back to > > `Vec`, but that relies on compiler-specific assumptions, and could also > > break if the PETSc source code was updated.) We also shuffled the order > of > > some calls to `VecCopy` and `TSPreStage` in the routine > `TSAlpha_Restart`, > > so that `TSPreStage` can set the initial guess, although that sounds like > > it would be unnecessary if we instead used a callback in > > `SNESSetComputeInitialGuess` that had access to the internals of > > `TS_Alpha`. > > > > Thanks, David > > > > On Thu, Aug 3, 2023 at 11:28?PM Jed Brown wrote: > > > >> I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to > modify > >> the initial guess for SNES. Would that serve your needs? Is there > anything > >> else you can say about how you'd like to compute this initial guess? Is > >> there a paper or something? > >> > >> David Kamensky writes: > >> > >> > Hi, > >> > > >> > My understanding is that the second-order generalized-alpha time > stepper > >> in > >> > PETSc uses a same-displacement predictor as the initial guess for the > >> > nonlinear solver that executes in each time step. I'd like to be > able to > >> > set this to something else, to improve convergence. However, my > >> > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` > >> haven't > >> > worked out. Is there any way to set a custom predictor? > >> > > >> > Thanks, > >> > David Kamensky > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Aug 4 13:56:25 2023 From: jed at jedbrown.org (Jed Brown) Date: Fri, 04 Aug 2023 12:56:25 -0600 Subject: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper In-Reply-To: References: <87msz7xzwy.fsf@jedbrown.org> <87wmyawzcp.fsf@jedbrown.org> Message-ID: <87fs4yws9y.fsf@jedbrown.org> Yeah, we'd like the implementation to stay in alpha2.c. There could be a new interface TSAlpha2SetPredictorType (with -ts_alpha2_predictor_type [none,same_velocity,...]) or TSAlpha2SetPredictorFunction. David Kamensky writes: > Hi Jed, > > The current workaround I'm using is very minimal and basically just moves > the definition of `TS_Alpha` from `alpha2.c` up to `petsc/private/tsimpl.h` > (and renames it to avoid a conflict with `TS_Alpha` in `alpha1.c`), but I > gather that we're really still not "supposed to" include that header in > applications. So, I don't know whether something like that would be > welcomed upstream. The actual computation of the predictor is done on the > application side. > > Having options like `-ts_alpha_same_velocity` and > `-ts_alpha_same_acceleration` could probably be implemented by analogy to > `-ts_theta_initial_guess_extrapolate`, although they wouldn't quite cover > my specific use-case, where I'm only setting the predictor on part of the > solution vector. So, maybe something more general, like providing a > generalized-$\alpha$-specific option for a custom predictor callback that > takes `X0`, `V0`, and `A0` arguments would be the cleanest solution (and > some convenient shortcuts for full-solution same-velocity and > same-acceleration predictors could subsequently make use of that > infrastructure). I've been working quickly over the past week, but I might > be able to take some time to implement a more sustainable solution soon. > > Thanks again, > David > > On Fri, Aug 4, 2023 at 9:23?AM Jed Brown wrote: > >> Some other TS implementations have a concept of extrapolation as an >> initial guess. Such method-specific initial guesses sound like they fit >> that pattern and would be welcome to be included in alpha2.c. Would you be >> willing to make a merge request to bring your work upstream? >> >> David Kamensky writes: >> >> > Hi Jed, >> > >> > What I'm trying to compute is basically a standard same-velocity or >> > same-acceleration predictor (although slightly more complicated, since >> I'm >> > restricting it to a sub-system). I hadn't looked into >> > `SNESSetComputeInitialGuess` yet, although one difficulty is that it >> would >> > need access to the `X0`, `V0`, and `A0` members of the `TS_Alpha` struct, >> > which is only defined in `alpha2.c`, and thus not available through the >> > API. >> > >> > For now, we just worked around this by patching PETSc to move the >> > definition of `TS_Alpha` up into a header to make it accessible. >> > (Modifying the library obviously introduces a maintenance headache; I >> also >> > considered just casting the `ts->data` pointer to `(char*)`, calculating >> > memory offsets based on `sizeof` the struct members, and casting back to >> > `Vec`, but that relies on compiler-specific assumptions, and could also >> > break if the PETSc source code was updated.) We also shuffled the order >> of >> > some calls to `VecCopy` and `TSPreStage` in the routine >> `TSAlpha_Restart`, >> > so that `TSPreStage` can set the initial guess, although that sounds like >> > it would be unnecessary if we instead used a callback in >> > `SNESSetComputeInitialGuess` that had access to the internals of >> > `TS_Alpha`. >> > >> > Thanks, David >> > >> > On Thu, Aug 3, 2023 at 11:28?PM Jed Brown wrote: >> > >> >> I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to >> modify >> >> the initial guess for SNES. Would that serve your needs? Is there >> anything >> >> else you can say about how you'd like to compute this initial guess? Is >> >> there a paper or something? >> >> >> >> David Kamensky writes: >> >> >> >> > Hi, >> >> > >> >> > My understanding is that the second-order generalized-alpha time >> stepper >> >> in >> >> > PETSc uses a same-displacement predictor as the initial guess for the >> >> > nonlinear solver that executes in each time step. I'd like to be >> able to >> >> > set this to something else, to improve convergence. However, my >> >> > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` >> >> haven't >> >> > worked out. Is there any way to set a custom predictor? >> >> > >> >> > Thanks, >> >> > David Kamensky >> >> >> From knepley at gmail.com Fri Aug 4 14:00:44 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 4 Aug 2023 15:00:44 -0400 Subject: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper In-Reply-To: <87fs4yws9y.fsf@jedbrown.org> References: <87msz7xzwy.fsf@jedbrown.org> <87wmyawzcp.fsf@jedbrown.org> <87fs4yws9y.fsf@jedbrown.org> Message-ID: If you want to make a PR with your hack, we can help build out the infrastructure for what Jed is recommending. Thanks, Matt On Fri, Aug 4, 2023 at 2:56?PM Jed Brown wrote: > Yeah, we'd like the implementation to stay in alpha2.c. There could be a > new interface TSAlpha2SetPredictorType (with -ts_alpha2_predictor_type > [none,same_velocity,...]) or TSAlpha2SetPredictorFunction. > > David Kamensky writes: > > > Hi Jed, > > > > The current workaround I'm using is very minimal and basically just moves > > the definition of `TS_Alpha` from `alpha2.c` up to > `petsc/private/tsimpl.h` > > (and renames it to avoid a conflict with `TS_Alpha` in `alpha1.c`), but I > > gather that we're really still not "supposed to" include that header in > > applications. So, I don't know whether something like that would be > > welcomed upstream. The actual computation of the predictor is done on > the > > application side. > > > > Having options like `-ts_alpha_same_velocity` and > > `-ts_alpha_same_acceleration` could probably be implemented by analogy to > > `-ts_theta_initial_guess_extrapolate`, although they wouldn't quite cover > > my specific use-case, where I'm only setting the predictor on part of the > > solution vector. So, maybe something more general, like providing a > > generalized-$\alpha$-specific option for a custom predictor callback that > > takes `X0`, `V0`, and `A0` arguments would be the cleanest solution (and > > some convenient shortcuts for full-solution same-velocity and > > same-acceleration predictors could subsequently make use of that > > infrastructure). I've been working quickly over the past week, but I > might > > be able to take some time to implement a more sustainable solution soon. > > > > Thanks again, > > David > > > > On Fri, Aug 4, 2023 at 9:23?AM Jed Brown wrote: > > > >> Some other TS implementations have a concept of extrapolation as an > >> initial guess. Such method-specific initial guesses sound like they fit > >> that pattern and would be welcome to be included in alpha2.c. Would you > be > >> willing to make a merge request to bring your work upstream? > >> > >> David Kamensky writes: > >> > >> > Hi Jed, > >> > > >> > What I'm trying to compute is basically a standard same-velocity or > >> > same-acceleration predictor (although slightly more complicated, since > >> I'm > >> > restricting it to a sub-system). I hadn't looked into > >> > `SNESSetComputeInitialGuess` yet, although one difficulty is that it > >> would > >> > need access to the `X0`, `V0`, and `A0` members of the `TS_Alpha` > struct, > >> > which is only defined in `alpha2.c`, and thus not available through > the > >> > API. > >> > > >> > For now, we just worked around this by patching PETSc to move the > >> > definition of `TS_Alpha` up into a header to make it accessible. > >> > (Modifying the library obviously introduces a maintenance headache; I > >> also > >> > considered just casting the `ts->data` pointer to `(char*)`, > calculating > >> > memory offsets based on `sizeof` the struct members, and casting back > to > >> > `Vec`, but that relies on compiler-specific assumptions, and could > also > >> > break if the PETSc source code was updated.) We also shuffled the > order > >> of > >> > some calls to `VecCopy` and `TSPreStage` in the routine > >> `TSAlpha_Restart`, > >> > so that `TSPreStage` can set the initial guess, although that sounds > like > >> > it would be unnecessary if we instead used a callback in > >> > `SNESSetComputeInitialGuess` that had access to the internals of > >> > `TS_Alpha`. > >> > > >> > Thanks, David > >> > > >> > On Thu, Aug 3, 2023 at 11:28?PM Jed Brown wrote: > >> > > >> >> I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to > >> modify > >> >> the initial guess for SNES. Would that serve your needs? Is there > >> anything > >> >> else you can say about how you'd like to compute this initial guess? > Is > >> >> there a paper or something? > >> >> > >> >> David Kamensky writes: > >> >> > >> >> > Hi, > >> >> > > >> >> > My understanding is that the second-order generalized-alpha time > >> stepper > >> >> in > >> >> > PETSc uses a same-displacement predictor as the initial guess for > the > >> >> > nonlinear solver that executes in each time step. I'd like to be > >> able to > >> >> > set this to something else, to improve convergence. However, my > >> >> > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` > >> >> haven't > >> >> > worked out. Is there any way to set a custom predictor? > >> >> > > >> >> > Thanks, > >> >> > David Kamensky > >> >> > >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuktsinghua at gmail.com Sat Aug 5 02:21:43 2023 From: wuktsinghua at gmail.com (K. Wu) Date: Sat, 5 Aug 2023 09:21:43 +0200 Subject: [petsc-users] Something went wrong with PETSc after installing ParaView Message-ID: Hi all, Good day! After installing ParaView on my desktop, PETSc starts to work anomalously even after reconfiguration: 1. If I use mpirun (frequently used before), it seems that now all the processors will run the program independently without communication. While mpiexec seems to work properly. 2. The Makefile (as attached) which works fine before starts to complain: make: *** No rule to make target 'chkopts', needed by 'test'. Stop. Thanks for your kind help! Best regards, Kai PETSC_DIR=~/petsc PETSC_ARCH=arch-linux-c-debug CFLAGS = -I. FFLAGS= CPPFLAGS=-I. FPPFLAGS= LOCDIR= EXAMPLESC= EXAMPLESF= MANSEC= CLEANFILES= NP= include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules include ${PETSC_DIR}/lib/petsc/conf/test test: ex2-2.o chkopts rm -rf topopt -${CLINKER} -o test ex2-2.o ${PETSC_SYS_LIB} ${RM} ex2-2.o rm -rf *.o -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Aug 5 07:22:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 5 Aug 2023 08:22:58 -0400 Subject: [petsc-users] Something went wrong with PETSc after installing ParaView In-Reply-To: References: Message-ID: On Sat, Aug 5, 2023 at 3:22?AM K. Wu wrote: > Hi all, > > Good day! > > After installing ParaView on my desktop, PETSc starts to work anomalously > even after reconfiguration: > 1. If I use mpirun (frequently used before), it seems that now all the > processors will run the program independently without communication. While > mpiexec seems to work properly. > Yes, this indicates that the "mpirun" is from a different installation of MPI than "mpiexec", which seems to be from the MPI installation used to build PETSc. Probably the Paraview package installed its own MPI package. > > 2. The Makefile (as attached) which works fine before starts to complain: > make: *** No rule to make target 'chkopts', needed by 'test'. Stop. > The 'chkopts' target is defined in the toplevel PETSc makefile, but it is deprecated. You can take it out. Thanks, Matt > Thanks for your kind help! > > Best regards, > Kai > > > PETSC_DIR=~/petsc > PETSC_ARCH=arch-linux-c-debug > CFLAGS = -I. > FFLAGS= > CPPFLAGS=-I. > FPPFLAGS= > LOCDIR= > EXAMPLESC= > EXAMPLESF= > MANSEC= > CLEANFILES= > NP= > > > include ${PETSC_DIR}/lib/petsc/conf/variables > include ${PETSC_DIR}/lib/petsc/conf/rules > include ${PETSC_DIR}/lib/petsc/conf/test > > test: ex2-2.o chkopts > rm -rf topopt > -${CLINKER} -o test ex2-2.o ${PETSC_SYS_LIB} > ${RM} ex2-2.o > rm -rf *.o > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuktsinghua at gmail.com Sat Aug 5 09:10:34 2023 From: wuktsinghua at gmail.com (K. Wu) Date: Sat, 5 Aug 2023 16:10:34 +0200 Subject: [petsc-users] Something went wrong with PETSc after installing ParaView In-Reply-To: References: Message-ID: Dear Matthew, Thanks for your reply! Is there any way that I can choose to use the previous MPI installation used to build PETSc? Regards, Kai Matthew Knepley ?2023?8?5??? 14:23??? > On Sat, Aug 5, 2023 at 3:22?AM K. Wu wrote: > >> Hi all, >> >> Good day! >> >> After installing ParaView on my desktop, PETSc starts to work anomalously >> even after reconfiguration: >> 1. If I use mpirun (frequently used before), it seems that now all the >> processors will run the program independently without communication. While >> mpiexec seems to work properly. >> > > Yes, this indicates that the "mpirun" is from a different installation of > MPI than "mpiexec", which seems to be from the MPI installation used to > build PETSc. Probably the Paraview package installed its own MPI package. > > >> >> 2. The Makefile (as attached) which works fine before starts to complain: >> make: *** No rule to make target 'chkopts', needed by 'test'. Stop. >> > > The 'chkopts' target is defined in the toplevel PETSc makefile, but it is > deprecated. You can take it out. > > Thanks, > > Matt > > >> Thanks for your kind help! >> >> Best regards, >> Kai >> >> >> PETSC_DIR=~/petsc >> PETSC_ARCH=arch-linux-c-debug >> CFLAGS = -I. >> FFLAGS= >> CPPFLAGS=-I. >> FPPFLAGS= >> LOCDIR= >> EXAMPLESC= >> EXAMPLESF= >> MANSEC= >> CLEANFILES= >> NP= >> >> >> include ${PETSC_DIR}/lib/petsc/conf/variables >> include ${PETSC_DIR}/lib/petsc/conf/rules >> include ${PETSC_DIR}/lib/petsc/conf/test >> >> test: ex2-2.o chkopts >> rm -rf topopt >> -${CLINKER} -o test ex2-2.o ${PETSC_SYS_LIB} >> ${RM} ex2-2.o >> rm -rf *.o >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Aug 5 09:18:53 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 5 Aug 2023 10:18:53 -0400 Subject: [petsc-users] Something went wrong with PETSc after installing ParaView In-Reply-To: References: Message-ID: On Sat, Aug 5, 2023 at 10:10?AM K. Wu wrote: > Dear Matthew, > > Thanks for your reply! > > Is there any way that I can choose to use the previous MPI installation > used to build PETSc? > First we would need to know what that was. You can send configure.log, which will have that information in it. Thanks, Matt > Regards, > Kai > > Matthew Knepley ?2023?8?5??? 14:23??? > >> On Sat, Aug 5, 2023 at 3:22?AM K. Wu wrote: >> >>> Hi all, >>> >>> Good day! >>> >>> After installing ParaView on my desktop, PETSc starts to work >>> anomalously even after reconfiguration: >>> 1. If I use mpirun (frequently used before), it seems that now all the >>> processors will run the program independently without communication. While >>> mpiexec seems to work properly. >>> >> >> Yes, this indicates that the "mpirun" is from a different installation of >> MPI than "mpiexec", which seems to be from the MPI installation used to >> build PETSc. Probably the Paraview package installed its own MPI package. >> >> >>> >>> 2. The Makefile (as attached) which works fine before starts to complain: >>> make: *** No rule to make target 'chkopts', needed by 'test'. Stop. >>> >> >> The 'chkopts' target is defined in the toplevel PETSc makefile, but it is >> deprecated. You can take it out. >> >> Thanks, >> >> Matt >> >> >>> Thanks for your kind help! >>> >>> Best regards, >>> Kai >>> >>> >>> PETSC_DIR=~/petsc >>> PETSC_ARCH=arch-linux-c-debug >>> CFLAGS = -I. >>> FFLAGS= >>> CPPFLAGS=-I. >>> FPPFLAGS= >>> LOCDIR= >>> EXAMPLESC= >>> EXAMPLESF= >>> MANSEC= >>> CLEANFILES= >>> NP= >>> >>> >>> include ${PETSC_DIR}/lib/petsc/conf/variables >>> include ${PETSC_DIR}/lib/petsc/conf/rules >>> include ${PETSC_DIR}/lib/petsc/conf/test >>> >>> test: ex2-2.o chkopts >>> rm -rf topopt >>> -${CLINKER} -o test ex2-2.o ${PETSC_SYS_LIB} >>> ${RM} ex2-2.o >>> rm -rf *.o >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuktsinghua at gmail.com Sun Aug 6 02:03:46 2023 From: wuktsinghua at gmail.com (K. Wu) Date: Sun, 6 Aug 2023 09:03:46 +0200 Subject: [petsc-users] Something went wrong with PETSc after installing ParaView In-Reply-To: References: Message-ID: Dear Matthew, Thanks for your kind help, please see in attachment the configure.log file for PETSc. Regards, Kai Matthew Knepley ?2023?8?5??? 16:19??? > On Sat, Aug 5, 2023 at 10:10?AM K. Wu wrote: > >> Dear Matthew, >> >> Thanks for your reply! >> >> Is there any way that I can choose to use the previous MPI installation >> used to build PETSc? >> > > First we would need to know what that was. You can send configure.log, > which will have that information in it. > > Thanks, > > Matt > > >> Regards, >> Kai >> >> Matthew Knepley ?2023?8?5??? 14:23??? >> >>> On Sat, Aug 5, 2023 at 3:22?AM K. Wu wrote: >>> >>>> Hi all, >>>> >>>> Good day! >>>> >>>> After installing ParaView on my desktop, PETSc starts to work >>>> anomalously even after reconfiguration: >>>> 1. If I use mpirun (frequently used before), it seems that now all the >>>> processors will run the program independently without communication. While >>>> mpiexec seems to work properly. >>>> >>> >>> Yes, this indicates that the "mpirun" is from a different installation >>> of MPI than "mpiexec", which seems to be from the MPI installation used to >>> build PETSc. Probably the Paraview package installed its own MPI package. >>> >>> >>>> >>>> 2. The Makefile (as attached) which works fine before starts to >>>> complain: >>>> make: *** No rule to make target 'chkopts', needed by 'test'. Stop. >>>> >>> >>> The 'chkopts' target is defined in the toplevel PETSc makefile, but it >>> is deprecated. You can take it out. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks for your kind help! >>>> >>>> Best regards, >>>> Kai >>>> >>>> >>>> PETSC_DIR=~/petsc >>>> PETSC_ARCH=arch-linux-c-debug >>>> CFLAGS = -I. >>>> FFLAGS= >>>> CPPFLAGS=-I. >>>> FPPFLAGS= >>>> LOCDIR= >>>> EXAMPLESC= >>>> EXAMPLESF= >>>> MANSEC= >>>> CLEANFILES= >>>> NP= >>>> >>>> >>>> include ${PETSC_DIR}/lib/petsc/conf/variables >>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>> >>>> test: ex2-2.o chkopts >>>> rm -rf topopt >>>> -${CLINKER} -o test ex2-2.o ${PETSC_SYS_LIB} >>>> ${RM} ex2-2.o >>>> rm -rf *.o >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 2264598 bytes Desc: not available URL: From knepley at gmail.com Sun Aug 6 06:13:39 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 6 Aug 2023 07:13:39 -0400 Subject: [petsc-users] Something went wrong with PETSc after installing ParaView In-Reply-To: References: Message-ID: On Sun, Aug 6, 2023 at 3:03?AM K. Wu wrote: > Dear Matthew, > > Thanks for your kind help, please see in attachment the configure.log file > for PETSc. > Okay, you told PETSc to build MPICH, so you should use /lhome/kai/Documents/petsc/linux-c-debug-directsolver/bin/mpiexec -n ./myprog It shows this is the report at the end of the log. Thanks, Matt Regards, > Kai > > Matthew Knepley ?2023?8?5??? 16:19??? > >> On Sat, Aug 5, 2023 at 10:10?AM K. Wu wrote: >> >>> Dear Matthew, >>> >>> Thanks for your reply! >>> >>> Is there any way that I can choose to use the previous MPI installation >>> used to build PETSc? >>> >> >> First we would need to know what that was. You can send configure.log, >> which will have that information in it. >> >> Thanks, >> >> Matt >> >> >>> Regards, >>> Kai >>> >>> Matthew Knepley ?2023?8?5??? 14:23??? >>> >>>> On Sat, Aug 5, 2023 at 3:22?AM K. Wu wrote: >>>> >>>>> Hi all, >>>>> >>>>> Good day! >>>>> >>>>> After installing ParaView on my desktop, PETSc starts to work >>>>> anomalously even after reconfiguration: >>>>> 1. If I use mpirun (frequently used before), it seems that now all the >>>>> processors will run the program independently without communication. While >>>>> mpiexec seems to work properly. >>>>> >>>> >>>> Yes, this indicates that the "mpirun" is from a different installation >>>> of MPI than "mpiexec", which seems to be from the MPI installation used to >>>> build PETSc. Probably the Paraview package installed its own MPI package. >>>> >>>> >>>>> >>>>> 2. The Makefile (as attached) which works fine before starts to >>>>> complain: >>>>> make: *** No rule to make target 'chkopts', needed by 'test'. Stop. >>>>> >>>> >>>> The 'chkopts' target is defined in the toplevel PETSc makefile, but it >>>> is deprecated. You can take it out. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks for your kind help! >>>>> >>>>> Best regards, >>>>> Kai >>>>> >>>>> >>>>> PETSC_DIR=~/petsc >>>>> PETSC_ARCH=arch-linux-c-debug >>>>> CFLAGS = -I. >>>>> FFLAGS= >>>>> CPPFLAGS=-I. >>>>> FPPFLAGS= >>>>> LOCDIR= >>>>> EXAMPLESC= >>>>> EXAMPLESF= >>>>> MANSEC= >>>>> CLEANFILES= >>>>> NP= >>>>> >>>>> >>>>> include ${PETSC_DIR}/lib/petsc/conf/variables >>>>> include ${PETSC_DIR}/lib/petsc/conf/rules >>>>> include ${PETSC_DIR}/lib/petsc/conf/test >>>>> >>>>> test: ex2-2.o chkopts >>>>> rm -rf topopt >>>>> -${CLINKER} -o test ex2-2.o ${PETSC_SYS_LIB} >>>>> ${RM} ex2-2.o >>>>> rm -rf *.o >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From superjimbo at proton.me Mon Aug 7 18:37:15 2023 From: superjimbo at proton.me (superjimbo) Date: Mon, 07 Aug 2023 23:37:15 +0000 Subject: [petsc-users] Advice on setting up Schur complement based field-split preconditioner Message-ID: Hello PETSc team, I am trying to solve the coupled incompressible navier-stokes equations, discretizedby a finite volume method. The discretization includes a 'stabilized-pressure' equation. After reading through the mailing list archives (and the mentioned literature), the advice was to use the Schur complement based block preconditioner for the saddle point problem. My setup looks like this: (lid-driven cavity at Re=3200), "-ksp_type fgmres -ksp_gmres -ksp_rtol 1e-04 -ksp_pc_side right"\ " -pc_type fieldsplit"\ " -pc_fieldsplit_0_fields 1,2,3"\ " -pc_fieldsplit_1_fields 0"\ " -pc_fieldsplit_type schur"\ " -pc_fieldsplit_schur_fact_type diag"\ " -pc_fieldsplit_schur_precondition a11"\ " -fieldsplit_0_ksp_type gmres -fieldsplit_0_ksp_pc_side right"\ " -fieldsplit_0_ksp_rtol 1e-02"\ " -fieldsplit_0_pc_type hypre"\ " -fieldsplit_0_pc_hypre_type boomeramg"\ " -fieldsplit_0_pc_hypre_boomeramg_strong_threshold 0.50"\ " -fiedlsplit_0_pc_hypre_boomeramg_agg_nl 4"\ " -fieldsplit_0_pc_hypre_boomeramg_max_levels 6"\ " -fieldsplit_0_pc_hypre_boomeramg_relax_weight_all 0.8"\ " -fieldsplit_0_pc_hypre_boomeramg_max_iter 1"\ " -fieldsplit_1_ksp_type gmres -fieldsplit_1_ksp_pc_side right"\ " -fieldsplit_1_ksp_rtol 1e-02"\ " -fieldsplit_1_pc_type hypre"\ " -fieldsplit_1_pc_hypre_type boomeramg"\ " -fieldsplit_1_pc_hypre_boomeramg_strong_threshold 0.5"\ " -fieldsplit_1_pc_hypre_boomeramg_smooth_type euclid"\ " -fiedlsplit_1_pc_hypre_boomeramg_agg_nl 4"\ " -fieldsplit_1_pc_hypre_boomeramg_max_levels 6"\ " -fieldsplit_1_pc_hypre_boomeramg_relax_weight_all 0.7"\ " -fieldsplit_1_pc_hypre_boomeramg_max_iter 4"\ " -fieldsplit_ksp_converged_reason"\ " -ksp_converged_reason" Here, the pressure variable index is 0, and the velocity components are 1,2 and 3. This setup technically runs. However, as the simulation progresses, the number of linear iterations for pc_fieldsplit_1 (pressure variable) increases. Am I setting the preconditioner correctly, Could you please advise? Thanks and best regards, Jim. ----- KSP Object: 1 MPI process type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.0001, absolute=1e-50, divergence=10000. right preconditioning using DEFAULT norm type for convergence test PC Object: 1 MPI process type: fieldsplit FieldSplit with Schur preconditioner, blocksize = 4, factorization DIAG Preconditioner for the Schur complement formed from A11 Split info: Split number 0 Fields 1, 2, 3 Split number 1 Fields 0 KSP solver for A00 block KSP Object: (fieldsplit_0_) 1 MPI process type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.01, absolute=1e-50, divergence=10000. right preconditioning using DEFAULT norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI process type: hypre PC has not been set up so information may be incomplete HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 6 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.5 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down symmetric-SOR/Jacobi Relax up symmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 0.8 Outer relax weight (all) 1. Using CF-relaxation Not using more complex smoothers. Measure type local Coarsen type Falgout Interpolation type classical SpGEMM type cusparse linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI process type: seqaij rows=196608, cols=196608, bs=3 total: nonzeros=2939904, allocated nonzeros=2939904 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 65536 nodes, limit used is 5 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_1_) 1 MPI process type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.01, absolute=1e-50, divergence=10000. right preconditioning using DEFAULT norm type for convergence test PC Object: (fieldsplit_1_) 1 MPI process type: hypre PC has not been set up so information may be incomplete HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 6 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.5 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down symmetric-SOR/Jacobi Relax up symmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 0.3 Outer relax weight (all) 1. Using CF-relaxation Smooth type Euclid Smooth num levels 25 Euclid ILU(k) levels 0 Euclid ILU(k) drop tolerance 0. Euclid ILU use Block-Jacobi? 0 Measure type local Coarsen type Falgout Interpolation type classical SpGEMM type cusparse linear system matrix followed by preconditioner matrix: Mat Object: (fieldsplit_1_) 1 MPI process type: schurcomplement rows=65536, cols=65536 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_1_) 1 MPI process type: seqaij rows=65536, cols=65536 total: nonzeros=326656, allocated nonzeros=326656 total number of mallocs used during MatSetValues calls=0 not using I-node routines A10 Mat Object: 1 MPI process type: seqaij rows=65536, cols=196608 total: nonzeros=979968, allocated nonzeros=979968 total number of mallocs used during MatSetValues calls=0 not using I-node routines KSP of A00 KSP Object: (fieldsplit_0_) 1 MPI process type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.01, absolute=1e-50, divergence=10000. right preconditioning using DEFAULT norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI process type: hypre PC has not been set up so information may be incomplete HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 6 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.5 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down symmetric-SOR/Jacobi Relax up symmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 0.8 Outer relax weight (all) 1. Using CF-relaxation Not using more complex smoothers. Measure type local Coarsen type Falgout Interpolation type classical SpGEMM type cusparse linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI process type: seqaij rows=196608, cols=196608, bs=3 total: nonzeros=2939904, allocated nonzeros=2939904 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 65536 nodes, limit used is 5 A01 Mat Object: 1 MPI process type: seqaij rows=196608, cols=65536, rbs=3, cbs=1 total: nonzeros=979968, allocated nonzeros=979968 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 65536 nodes, limit used is 5 Mat Object: (fieldsplit_1_) 1 MPI process type: seqaij rows=65536, cols=65536 total: nonzeros=326656, allocated nonzeros=326656 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI process type: seqaij rows=262144, cols=262144, bs=4 total: nonzeros=5226496, allocated nonzeros=5226496 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 65536 nodes, limit used is 5 [0] Iteration 0: 1 1 1 1 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 1 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 13 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 13 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 13 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 Linear solve converged due to CONVERGED_RTOL iterations 10 [0] Iteration 1: 0.260452 1.79947 1.97071 0.338806 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 21 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 21 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 Linear solve converged due to CONVERGED_RTOL iterations 10 [0] Iteration 2: 0.0623281 0.221591 0.233678 0.0802229 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 Linear solve converged due to CONVERGED_RTOL iterations 10 [0] Iteration 3: 0.0435829 0.0267908 0.0120759 0.017118 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 39 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 44 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 36 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 46 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 46 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 44 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 47 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 46 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 45 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 47 Linear solve converged due to CONVERGED_RTOL iterations 10 [0] Iteration 4: 0.0134479 0.0127978 0.00651386 0.0035045 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 57 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 54 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 62 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 56 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 74 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 63 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 Linear solve converged due to CONVERGED_RTOL iterations 10 [0] Iteration 5: 0.00385315 0.00686313 0.00592293 0.000710284 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 91 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 62 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 57 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 93 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 79 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 86 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 72 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 102 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 70 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 71 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 71 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 88 Linear solve converged due to CONVERGED_RTOL iterations 12[0] Iteration 6: 0.00131378 0.00417556 0.00563892 0.000143316 Best, Jim Sent with [Proton Mail](https://proton.me/) secure email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 7 20:17:05 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 7 Aug 2023 21:17:05 -0400 Subject: [petsc-users] Advice on setting up Schur complement based field-split preconditioner In-Reply-To: References: Message-ID: On Mon, Aug 7, 2023 at 7:41?PM superjimbo via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello PETSc team, > > I am trying to solve the coupled incompressible navier-stokes equations, > discretizedby a > finite volume method. The discretization includes a 'stabilized-pressure' > equation. After reading through the mailing list archives (and the > mentioned literature), the advice > was to use the Schur complement based block preconditioner for the saddle > point problem. > > My setup looks like this: (lid-driven cavity at Re=3200), > > "-ksp_type fgmres -ksp_gmres -ksp_rtol 1e-04 -ksp_pc_side right"\ > " -pc_type fieldsplit"\ > " -pc_fieldsplit_0_fields 1,2,3"\ > " -pc_fieldsplit_1_fields 0"\ > " -pc_fieldsplit_type schur"\ > " -pc_fieldsplit_schur_fact_type diag"\ > " -pc_fieldsplit_schur_precondition a11"\ > We need to know what equations you actually have. I don't know if A11 is a good preconditioner for the Schur complement in your problem. It looks like you are taking a million iterates in the Schur complement solve, so I am guessing that it is not. The first step is getting a good Schur solve, no matter how expensive it is. I would use LU factorization on it. If it is not < 10 iterates, it is not a great preconditioner. I do not know much about stabilized formulations. For stable finite elements, the pressure mass matrix is a good preconditioner. That is what I use in SNES ex62 and ex69. I recommend finding a good PC from papers on your formulation and then reproducing it in PETSc (hopefully fairly easy). Thanks, MAtt > " -fieldsplit_0_ksp_type gmres > -fieldsplit_0_ksp_pc_side right"\ > " -fieldsplit_0_ksp_rtol 1e-02"\ > " -fieldsplit_0_pc_type hypre"\ > " -fieldsplit_0_pc_hypre_type boomeramg"\ > " > -fieldsplit_0_pc_hypre_boomeramg_strong_threshold 0.50"\ > " -fiedlsplit_0_pc_hypre_boomeramg_agg_nl 4"\ > " -fieldsplit_0_pc_hypre_boomeramg_max_levels 6"\ > " > -fieldsplit_0_pc_hypre_boomeramg_relax_weight_all 0.8"\ > " -fieldsplit_0_pc_hypre_boomeramg_max_iter 1"\ > " -fieldsplit_1_ksp_type gmres > -fieldsplit_1_ksp_pc_side right"\ > " -fieldsplit_1_ksp_rtol 1e-02"\ > " -fieldsplit_1_pc_type hypre"\ > " -fieldsplit_1_pc_hypre_type boomeramg"\ > " > -fieldsplit_1_pc_hypre_boomeramg_strong_threshold 0.5"\ > " -fieldsplit_1_pc_hypre_boomeramg_smooth_type > euclid"\ > " -fiedlsplit_1_pc_hypre_boomeramg_agg_nl 4"\ > " -fieldsplit_1_pc_hypre_boomeramg_max_levels 6"\ > " > -fieldsplit_1_pc_hypre_boomeramg_relax_weight_all 0.7"\ > " -fieldsplit_1_pc_hypre_boomeramg_max_iter 4"\ > " -fieldsplit_ksp_converged_reason"\ > " -ksp_converged_reason" > > Here, the pressure variable index is 0, and the velocity components are > 1,2 and 3. This setup technically runs. > However, as the simulation progresses, the number of linear iterations > for pc_fieldsplit_1 (pressure variable) > increases. Am I setting the preconditioner correctly, Could you please > advise? > > Thanks and best regards, > Jim. > ----- > KSP Object: 1 MPI process > type: fgmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.0001, absolute=1e-50, divergence=10000. > right preconditioning > using DEFAULT norm type for convergence test > PC Object: 1 MPI process > type: fieldsplit > FieldSplit with Schur preconditioner, blocksize = 4, factorization DIAG > Preconditioner for the Schur complement formed from A11 > Split info: > Split number 0 Fields 1, 2, 3 > Split number 1 Fields 0 > KSP solver for A00 block > KSP Object: (fieldsplit_0_) 1 MPI process > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.01, absolute=1e-50, divergence=10000. > right preconditioning > using DEFAULT norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI process > type: hypre > PC has not been set up so information may be incomplete > HYPRE BoomerAMG preconditioning > Cycle type V > Maximum number of levels 6 > Maximum number of iterations PER hypre call 1 > Convergence tolerance PER hypre call 0. > Threshold for strong coupling 0.5 > Interpolation truncation factor 0. > Interpolation: max elements per row 0 > Number of levels of aggressive coarsening 0 > Number of paths for aggressive coarsening 1 > Maximum row sums 0.9 > Sweeps down 1 > Sweeps up 1 > Sweeps on coarse 1 > Relax down symmetric-SOR/Jacobi > Relax up symmetric-SOR/Jacobi > Relax on coarse Gaussian-elimination > Relax weight (all) 0.8 > Outer relax weight (all) 1. > Using CF-relaxation > Not using more complex smoothers. > Measure type local > Coarsen type Falgout > Interpolation type classical > SpGEMM type cusparse > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI process > type: seqaij > rows=196608, cols=196608, bs=3 > total: nonzeros=2939904, allocated nonzeros=2939904 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 65536 nodes, limit used is 5 > KSP solver for S = A11 - A10 inv(A00) A01 > KSP Object: (fieldsplit_1_) 1 MPI process > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.01, absolute=1e-50, divergence=10000. > right preconditioning > using DEFAULT norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI process > type: hypre > PC has not been set up so information may be incomplete > HYPRE BoomerAMG preconditioning > Cycle type V > Maximum number of levels 6 > Maximum number of iterations PER hypre call 1 > Convergence tolerance PER hypre call 0. > Threshold for strong coupling 0.5 > Interpolation truncation factor 0. > Interpolation: max elements per row 0 > Number of levels of aggressive coarsening 0 > Number of paths for aggressive coarsening 1 > Maximum row sums 0.9 > Sweeps down 1 > Sweeps up 1 > Sweeps on coarse 1 > Relax down symmetric-SOR/Jacobi > Relax up symmetric-SOR/Jacobi > Relax on coarse Gaussian-elimination > Relax weight (all) 0.3 > Outer relax weight (all) 1. > Using CF-relaxation > Smooth type Euclid > Smooth num levels 25 > Euclid ILU(k) levels 0 > Euclid ILU(k) drop tolerance 0. > Euclid ILU use Block-Jacobi? 0 > Measure type local > Coarsen type Falgout > Interpolation type classical > SpGEMM type cusparse > linear system matrix followed by preconditioner matrix: > Mat Object: (fieldsplit_1_) 1 MPI process > type: schurcomplement > rows=65536, cols=65536 > Schur complement A11 - A10 inv(A00) A01 > A11 > Mat Object: (fieldsplit_1_) 1 MPI process > type: seqaij > rows=65536, cols=65536 > total: nonzeros=326656, allocated nonzeros=326656 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > A10 > Mat Object: 1 MPI process > type: seqaij > rows=65536, cols=196608 > total: nonzeros=979968, allocated nonzeros=979968 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > KSP of A00 > KSP Object: (fieldsplit_0_) 1 MPI process > type: gmres > restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=0.01, absolute=1e-50, > divergence=10000. > right preconditioning > using DEFAULT norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI process > type: hypre > PC has not been set up so information may be incomplete > HYPRE BoomerAMG preconditioning > Cycle type V > Maximum number of levels 6 > Maximum number of iterations PER hypre call 1 > Convergence tolerance PER hypre call 0. > Threshold for strong coupling 0.5 > Interpolation truncation factor 0. > Interpolation: max elements per row 0 > Number of levels of aggressive coarsening 0 > Number of paths for aggressive coarsening 1 > Maximum row sums 0.9 > Sweeps down 1 > Sweeps up 1 > Sweeps on coarse 1 > Relax down symmetric-SOR/Jacobi > Relax up symmetric-SOR/Jacobi > Relax on coarse Gaussian-elimination > Relax weight (all) 0.8 > Outer relax weight (all) 1. > Using CF-relaxation > Not using more complex smoothers. > Measure type local > Coarsen type Falgout > Interpolation type classical > SpGEMM type cusparse > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI process > type: seqaij > rows=196608, cols=196608, bs=3 > total: nonzeros=2939904, allocated nonzeros=2939904 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 65536 nodes, limit used > is 5 > A01 > Mat Object: 1 MPI process > type: seqaij > rows=196608, cols=65536, rbs=3, cbs=1 > total: nonzeros=979968, allocated nonzeros=979968 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 65536 nodes, limit used is 5 > Mat Object: (fieldsplit_1_) 1 MPI process > type: seqaij > rows=65536, cols=65536 > total: nonzeros=326656, allocated nonzeros=326656 > total number of mallocs used during MatSetValues calls=0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI process > type: seqaij > rows=262144, cols=262144, bs=4 > total: nonzeros=5226496, allocated nonzeros=5226496 > total number of mallocs used during MatSetValues calls=0 > using I-node routines: found 65536 nodes, limit used is 5 > > [0] Iteration 0: 1 1 1 1 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 1 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 13 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 13 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 13 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 16 > Linear solve converged due to CONVERGED_RTOL iterations 10 > [0] Iteration 1: 0.260452 1.79947 1.97071 0.338806 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 21 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 21 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 22 > Linear solve converged due to CONVERGED_RTOL iterations 10 > [0] Iteration 2: 0.0623281 0.221591 0.233678 0.0802229 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 24 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 > Linear solve converged due to CONVERGED_RTOL iterations 10 > [0] Iteration 3: 0.0435829 0.0267908 0.0120759 0.017118 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 39 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 44 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 36 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 46 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 46 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 44 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 47 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 46 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 45 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 2 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 47 > Linear solve converged due to CONVERGED_RTOL iterations 10 > [0] Iteration 4: 0.0134479 0.0127978 0.00651386 0.0035045 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 57 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 54 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 62 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 56 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 74 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 63 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 3 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 59 > Linear solve converged due to CONVERGED_RTOL iterations 10 > [0] Iteration 5: 0.00385315 0.00686313 0.00592293 0.000710284 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 91 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 4 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 62 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 57 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 93 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 79 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 86 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 72 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 102 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 70 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 71 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 71 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 7 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 6 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_0_ solve converged due to CONVERGED_RTOL iterations 5 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 88 > Linear solve converged due to CONVERGED_RTOL iterations 12 > [0] Iteration 6: 0.00131378 0.00417556 0.00563892 0.000143316 > > > Best, > Jim > > Sent with Proton Mail secure email. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kabdelaz at purdue.edu Tue Aug 8 10:24:41 2023 From: kabdelaz at purdue.edu (Khaled Nabil Shar Abdelaziz) Date: Tue, 8 Aug 2023 15:24:41 +0000 Subject: [petsc-users] IMPI with Hypre Message-ID: Hello, I am running into trouble configuring petsc with hypre and intel mpi Configure command: ``` ./configure PETSC_ARCH=linux-intel-dbg --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-blaslapack-dir=$MKL_HOME --download-metis --download-parmetis --download-hypre --with-make-np=10 ``` Terminal output: ?``` ********************************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): --------------------------------------------------------------------------------------------- Using C++ dialect C++11 as lower bound due to package(s): - hypre But C++ compiler (mpiicpc) appears non-compliant with C++11 or didn't accept: - -std=c++20 - -std=c++17 - -std=c++14 - -std=c++11 ********************************************************************************************* ?``` I also attached the configure.log Thank you in advance. Best, Khaled -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 238164 bytes Desc: configure.log URL: From bsmith at petsc.dev Tue Aug 8 10:38:17 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 8 Aug 2023 11:38:17 -0400 Subject: [petsc-users] IMPI with Hypre In-Reply-To: References: Message-ID: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> We get these reports regularly. Intel is selecting [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] as the base compiler, this ancient version does not provide the language support needed by hypre. You need a more recent GNU compiler available for the Intel compilers to base themselves on. Barry > On Aug 8, 2023, at 11:24 AM, Khaled Nabil Shar Abdelaziz wrote: > > Hello, > I am running into trouble configuring petsc with hypre and intel mpi > > Configure command: > ``` > ./configure PETSC_ARCH=linux-intel-dbg --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-blaslapack-dir=$MKL_HOME --download-metis --download-parmetis --download-hypre --with-make-np=10 > ``` > > Terminal output: > ?``` > ********************************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > --------------------------------------------------------------------------------------------- > Using C++ dialect C++11 as lower bound due to package(s): > - hypre > But C++ compiler (mpiicpc) appears non-compliant with C++11 or didn't accept: > - -std=c++20 > - -std=c++17 > - -std=c++14 > - -std=c++11 > ********************************************************************************************* > ?``` > > I also attached the configure.log > > Thank you in advance. > Best, > Khaled > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Tue Aug 8 10:53:35 2023 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Tue, 8 Aug 2023 15:53:35 +0000 Subject: [petsc-users] IMPI with Hypre In-Reply-To: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> References: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> Message-ID: Maybe an option for specifying the explicit location of gcc version? The intel compiler has a ?-gcc-toolchain? option for that. https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-0/gcc-toolchain.html Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Aug 8 11:20:02 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 8 Aug 2023 11:20:02 -0500 (CDT) Subject: [petsc-users] IMPI with Hypre In-Reply-To: References: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> Message-ID: <55d203d3-74fa-4420-6d88-c11df3f0017d@mcs.anl.gov> Its easier to just add the newer version of gcc/g++ compilers to PATH - and icc will pick it up [without requiring -gcc-toolchain option] export PATH=/location/of/newer/g++/bin:$PATH ./configure ... make ... Satish On Tue, 8 Aug 2023, Victor Eijkhout wrote: > Maybe an option for specifying the explicit location of gcc version? The intel compiler has a ?-gcc-toolchain? option for that. > > https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-0/gcc-toolchain.html > > Victor. > > > > From eijkhout at tacc.utexas.edu Tue Aug 8 11:26:08 2023 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Tue, 8 Aug 2023 16:26:08 +0000 Subject: [petsc-users] IMPI with Hypre In-Reply-To: <55d203d3-74fa-4420-6d88-c11df3f0017d@mcs.anl.gov> References: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> <55d203d3-74fa-4420-6d88-c11df3f0017d@mcs.anl.gov> Message-ID: * Its easier to just add the newer version of gcc/g++ compilers to PATH Except that I do my path loading through environment modules (lmod version) and they do not allow multiple compilers to be loaded at the same time. But yes, that would work. V. From: Satish Balay Date: Tuesday, August 8, 2023 at 11:20 To: Victor Eijkhout Cc: Barry Smith , Khaled Nabil Shar Abdelaziz , petsc-users at mcs.anl.gov Subject: Re: [petsc-users] IMPI with Hypre Its easier to just add the newer version of gcc/g++ compilers to PATH - and icc will pick it up [without requiring -gcc-toolchain option] export PATH=/location/of/newer/g++/bin:$PATH ./configure ... make ... Satish On Tue, 8 Aug 2023, Victor Eijkhout wrote: > Maybe an option for specifying the explicit location of gcc version? The intel compiler has a ?-gcc-toolchain? option for that. > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdocs%2Fdpcpp-cpp-compiler%2Fdeveloper-guide-reference%2F2023-0%2Fgcc-toolchain.html&data=05%7C01%7C%7C1ee068e767e042b91c3708db982b57b1%7C31d7e2a5bdd8414e9e97bea998ebdfe1%7C0%7C0%7C638271084142649215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0jfUA3EMFkpeKOZdhawJjRyivCQ8yF2C50DP2pukJlk%3D&reserved=0 > > Victor. > > > > >> This message is from an external sender. Learn more about why this << >> matters at https://links.utexas.edu/rtyclf. << -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Aug 8 11:31:14 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 8 Aug 2023 11:31:14 -0500 (CDT) Subject: [petsc-users] IMPI with Hypre In-Reply-To: References: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> <55d203d3-74fa-4420-6d88-c11df3f0017d@mcs.anl.gov> Message-ID: If using modules - using 'module load gcc icc' [or equivalent] should normally work - but if the modules are setup such that loading icc unloads gcc - then I think that's a bug in this module setup.. [as icc has an (internal) dependency on gcc - so ignoring this dependency to remove a gcc module doesn't look correct to me] Satish On Tue, 8 Aug 2023, Victor Eijkhout wrote: > * Its easier to just add the newer version of gcc/g++ compilers to PATH > > Except that I do my path loading through environment modules (lmod version) and they do not allow multiple compilers to be loaded at the same time. > > But yes, that would work. > > V. > > From: Satish Balay > Date: Tuesday, August 8, 2023 at 11:20 > To: Victor Eijkhout > Cc: Barry Smith , Khaled Nabil Shar Abdelaziz , petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] IMPI with Hypre > Its easier to just add the newer version of gcc/g++ compilers to PATH - and icc will pick it up [without requiring -gcc-toolchain option] > > export PATH=/location/of/newer/g++/bin:$PATH > ./configure ... > make ... > > Satish > > On Tue, 8 Aug 2023, Victor Eijkhout wrote: > > > Maybe an option for specifying the explicit location of gcc version? The intel compiler has a ?-gcc-toolchain? option for that. > > > > https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-0/gcc-toolchain.html > > > > Victor. > > > > > > > > > >> This message is from an external sender. Learn more about why this << > >> matters at https://links.utexas.edu/rtyclf. << > From eijkhout at tacc.utexas.edu Tue Aug 8 12:27:53 2023 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Tue, 8 Aug 2023 17:27:53 +0000 Subject: [petsc-users] IMPI with Hypre In-Reply-To: References: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> <55d203d3-74fa-4420-6d88-c11df3f0017d@mcs.anl.gov> Message-ID: You say bug I say feature. Lmod has a way to mark modules as mutually exclusive. That?s a decision of the way the site is set up. For most users that?s a good idea. For instance, if you load two compilers, and both have an MPI, how do you decide which one is loaded by ?load mpich?? Etc. I?m sure thought has gone into this. Victor. << -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Aug 8 12:31:13 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 8 Aug 2023 12:31:13 -0500 (CDT) Subject: [petsc-users] IMPI with Hypre In-Reply-To: References: <7FC11317-C951-4C1B-BC18-19E5D549F770@petsc.dev> <55d203d3-74fa-4420-6d88-c11df3f0017d@mcs.anl.gov> Message-ID: <5a027a27-8e43-9e6c-e68a-1453269e57e7@mcs.anl.gov> Sure - but if 'module load icc' has gcc-4*' in path - that's a bug in the icc module spec [as that version is incompatible with its c++ support] . It should also load a compatible gcc version [via PATH - or via module dependencies] if its implemented this way - then you won't have a broken icc - that requires a swap of gcc. Satish On Tue, 8 Aug 2023, Victor Eijkhout wrote: > You say bug I say feature. Lmod has a way to mark modules as mutually exclusive. That?s a decision of the way the site is set up. For most users that?s a good idea. > > For instance, if you load two compilers, and both have an MPI, how do you decide which one is loaded by ?load mpich?? > > Etc. I?m sure thought has gone into this. > > Victor. > > > << > From ilya.foursov.7bd at gmail.com Wed Aug 9 07:49:58 2023 From: ilya.foursov.7bd at gmail.com (Ilya Fursov) Date: Wed, 9 Aug 2023 19:49:58 +0700 Subject: [petsc-users] DMPlex bug in src/snes/tutorials/ex17.c Message-ID: Hello, I have a problem running src/snes/tutorials/ex17.c in parallel, given the specific runtime options (these options are actually taken from the test example ex17_3d_q3_trig_elas). *The serial version works fine:* ./ex17 -dm_plex_box_faces 1,1,1 -sol_type elas_trig -dm_plex_dim 3 -dm_plex_simplex 0 -displacement_petscspace_degree 3 -dm_refine 0 -convest_num_refine 1 -snes_convergence_estimate -snes_monitor *The parallel version fails:* mpirun -n 2 ./ex17 -dm_plex_box_faces 1,1,1 -sol_type elas_trig -dm_plex_dim 3 -dm_plex_simplex 0 -displacement_petscspace_degree 3 -dm_refine 0 -convest_num_refine 1 -snes_convergence_estimate -snes_monitor *with the error message (--with-debugging=1):* [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: Cell 0 has not been assigned a cell type [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-convest_num_refine value: 1 source: command line [0]PETSC ERROR: Option left: name:-displacement_petscspace_degree value: 3 source: command line [0]PETSC ERROR: Option left: name:-snes_convergence_estimate (no value) source: command line [0]PETSC ERROR: Option left: name:-snes_monitor (no value) source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown [0]PETSC ERROR: ./ex17 on a DEBUG named ilya-HP-Notebook by ilya Wed Aug 9 19:23:02 2023 [0]PETSC ERROR: Configure options PETSC_ARCH=DEBUG PETSC_DIR=/home/ilya/build/petsc-09aug-debug/petsc --with-blaslapack-dir=/home/ilya/progs/OpenBLAS-0.3.21 --with-mpi-dir=/home/ilya/progs/openmpi-4.1.4 --with-debugging=1 --download-hdf5 --download-hypre --download-chaco --download-metis --download-parmetis --download-suitesparse --download-moab --download-mumps --download-scalapack --download-superlu --download-superlu_dist --download-triangle --download-ml --download-giflib --download-libjpeg --download-libpng --download-zlib --download-spai --download-tchem --download-party --download-cmake --download-hwloc --download-ptscotch --download-revolve --download-cams --download-spai [0]PETSC ERROR: #1 DMPlexGetCellType() at /home/ilya/build/petsc-09aug-debug/petsc/src/dm/impls/plex/plex.c:5169 [0]PETSC ERROR: #2 SetupFE() at ex17.c:621 [0]PETSC ERROR: #3 main() at ex17.c:654 [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -convest_num_refine 1 (source: command line) [0]PETSC ERROR: -displacement_petscspace_degree 3 (source: command line) [0]PETSC ERROR: -dm_plex_box_faces 1,1,1 (source: command line) [0]PETSC ERROR: -dm_plex_dim 3 (source: command line) [0]PETSC ERROR: -dm_plex_simplex 0 (source: command line) [0]PETSC ERROR: -dm_refine 0 (source: command line) [0]PETSC ERROR: -snes_convergence_estimate (source: command line) [0]PETSC ERROR: -snes_monitor (source: command line) [0]PETSC ERROR: -sol_type elas_trig (source: command line) [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- *And when --with-debugging=0, it fails with segmentation violation:* [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. Regards, Ilya -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngoetting at itp.uni-bremen.de Wed Aug 9 08:40:04 2023 From: ngoetting at itp.uni-bremen.de (=?UTF-8?Q?Niclas_G=C3=B6tting?=) Date: Wed, 9 Aug 2023 15:40:04 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE Message-ID: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will be well suited for PETSc. Currently, I solve the ODE with the following procedure in scipy (I can provide the necessary data files, if needed, but they are just some trace-preserving, very sparse matrices): import numpy as np import scipy.sparse import scipy.integrate from tqdm import tqdm l = np.load("../liouvillian.npy") pump = np.load("../pump_operator.npy") state = np.load("../initial_state.npy") l = scipy.sparse.csr_array(l) pump = scipy.sparse.csr_array(pump) def f(t, y, *args): ??? return (l + 0.5 * (5 < t < 10) * pump) @ y ??? #return l @ y # Uncomment for f(t) = 0 dt = 0.1 NUM_STEPS = 200 res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) solver = scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) times = [] for i in tqdm(range(NUM_STEPS)): ??? res[i, :] = solver.integrate(solver.t + dt) ??? times.append(solver.t) Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports about 330it/s on my machine. When converting the code to PETSc, I came to the following result (according to the chapter https://petsc.org/main/manual/ts/#special-cases) import sys import petsc4py petsc4py.init(args=sys.argv) import numpy as np import scipy.sparse from tqdm import tqdm from petsc4py import PETSc comm = PETSc.COMM_WORLD def mat_to_real(arr): ??? return np.block([[arr.real, -arr.imag], [arr.imag, arr.real]]).astype(np.float64) def mat_to_petsc_aij(arr): ??? arr_sc_sp = scipy.sparse.csr_array(arr) ??? mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) ??? rstart, rend = mat.getOwnershipRange() ??? print(rstart, rend) ??? print(arr.shape[0]) ??? print(mat.sizes) ??? I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] ??? J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] ??? V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] ??? print(I.shape, J.shape, V.shape) ??? mat.setValuesCSR(I, J, V) ??? mat.assemble() ??? return mat l = np.load("../liouvillian.npy") l = mat_to_real(l) pump = np.load("../pump_operator.npy") pump = mat_to_real(pump) state = np.load("../initial_state.npy") state = np.hstack([state.real, state.imag]).astype(np.float64) l = mat_to_petsc_aij(l) pump = mat_to_petsc_aij(pump) jac = l.duplicate() for i in range(8192): ??? jac.setValue(i, i, 0) jac.assemble() jac += l vec = l.createVecRight() vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) vec.assemble() dt = 0.1 ts = PETSc.TS().create(comm=comm) ts.setFromOptions() ts.setProblemType(ts.ProblemType.LINEAR) ts.setEquationType(ts.EquationType.ODE_EXPLICIT) ts.setType(ts.Type.RK) ts.setRKType(ts.RKType.RK3BS) ts.setTime(0) print("KSP:", ts.getKSP().getType()) print("KSP PC:",ts.getKSP().getPC().getType()) print("SNES :", ts.getSNES().getType()) def jacobian(ts, t, u, Amat, Pmat): ??? Amat.zeroEntries() ??? Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) ??? Amat.axpy(0.5 * (5 < t < 10), pump, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # Uncomment for f(t) = 0 ts.setRHSJacobian(jacobian, jac) NUM_STEPS = 200 res = np.empty((NUM_STEPS, 8192), dtype=np.float64) times = [] rstart, rend = vec.getOwnershipRange() for i in tqdm(range(NUM_STEPS)): ??? time = ts.getTime() ??? ts.setMaxTime(time + dt) ??? ts.solve(vec) ??? res[i, rstart:rend] = vec.getArray()[:] ??? times.append(time) I decomposed the complex ODE into a larger real ODE, so that I can easily switch maybe to GPU computation later on. Now, the solutions of both scripts are very much identical, but PETSc runs about 3 times slower at 120it/s on my machine. I don't use MPI for PETSc yet. I strongly suppose that the problem lies within the jacobian definition, as PETSc is about 3 times *faster* than scipy with f(t) = 0 and therefore a constant jacobian. Thank you in advance. All the best, Niclas From bsmith at petsc.dev Wed Aug 9 10:16:27 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 9 Aug 2023 11:16:27 -0400 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Message-ID: <3F07E496-AF8A-4BD4-A7A2-6CDA79DFC58D@petsc.dev> Was PETSc built with debugging turned off; so ./configure --with-debugging=0 ? Can you run with the equivalent of -log_view to get information about the time spent in the various operations and send that information. The data generated is the best starting point for determining where the code is spending the time. Thanks Barry > On Aug 9, 2023, at 9:40 AM, Niclas G?tting wrote: > > Hi all, > > I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will be well suited for PETSc. Currently, I solve the ODE with the following procedure in scipy (I can provide the necessary data files, if needed, but they are just some trace-preserving, very sparse matrices): > > import numpy as np > import scipy.sparse > import scipy.integrate > > from tqdm import tqdm > > > l = np.load("../liouvillian.npy") > pump = np.load("../pump_operator.npy") > state = np.load("../initial_state.npy") > > l = scipy.sparse.csr_array(l) > pump = scipy.sparse.csr_array(pump) > > def f(t, y, *args): > return (l + 0.5 * (5 < t < 10) * pump) @ y > #return l @ y # Uncomment for f(t) = 0 > > dt = 0.1 > NUM_STEPS = 200 > res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) > solver = scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) > times = [] > for i in tqdm(range(NUM_STEPS)): > res[i, :] = solver.integrate(solver.t + dt) > times.append(solver.t) > > Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports about 330it/s on my machine. When converting the code to PETSc, I came to the following result (according to the chapter https://petsc.org/main/manual/ts/#special-cases) > > import sys > import petsc4py > petsc4py.init(args=sys.argv) > import numpy as np > import scipy.sparse > > from tqdm import tqdm > from petsc4py import PETSc > > comm = PETSc.COMM_WORLD > > > def mat_to_real(arr): > return np.block([[arr.real, -arr.imag], [arr.imag, arr.real]]).astype(np.float64) > > def mat_to_petsc_aij(arr): > arr_sc_sp = scipy.sparse.csr_array(arr) > mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) > rstart, rend = mat.getOwnershipRange() > print(rstart, rend) > print(arr.shape[0]) > print(mat.sizes) > I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] > J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] > V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] > > print(I.shape, J.shape, V.shape) > mat.setValuesCSR(I, J, V) > mat.assemble() > return mat > > > l = np.load("../liouvillian.npy") > l = mat_to_real(l) > pump = np.load("../pump_operator.npy") > pump = mat_to_real(pump) > state = np.load("../initial_state.npy") > state = np.hstack([state.real, state.imag]).astype(np.float64) > > l = mat_to_petsc_aij(l) > pump = mat_to_petsc_aij(pump) > > > jac = l.duplicate() > for i in range(8192): > jac.setValue(i, i, 0) > jac.assemble() > jac += l > > vec = l.createVecRight() > vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) > vec.assemble() > > > dt = 0.1 > > ts = PETSc.TS().create(comm=comm) > ts.setFromOptions() > ts.setProblemType(ts.ProblemType.LINEAR) > ts.setEquationType(ts.EquationType.ODE_EXPLICIT) > ts.setType(ts.Type.RK) > ts.setRKType(ts.RKType.RK3BS) > ts.setTime(0) > print("KSP:", ts.getKSP().getType()) > print("KSP PC:",ts.getKSP().getPC().getType()) > print("SNES :", ts.getSNES().getType()) > > def jacobian(ts, t, u, Amat, Pmat): > Amat.zeroEntries() > Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) > Amat.axpy(0.5 * (5 < t < 10), pump, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) > > ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) > #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # Uncomment for f(t) = 0 > ts.setRHSJacobian(jacobian, jac) > > NUM_STEPS = 200 > res = np.empty((NUM_STEPS, 8192), dtype=np.float64) > times = [] > rstart, rend = vec.getOwnershipRange() > for i in tqdm(range(NUM_STEPS)): > time = ts.getTime() > ts.setMaxTime(time + dt) > ts.solve(vec) > res[i, rstart:rend] = vec.getArray()[:] > times.append(time) > > I decomposed the complex ODE into a larger real ODE, so that I can easily switch maybe to GPU computation later on. Now, the solutions of both scripts are very much identical, but PETSc runs about 3 times slower at 120it/s on my machine. I don't use MPI for PETSc yet. > > I strongly suppose that the problem lies within the jacobian definition, as PETSc is about 3 times *faster* than scipy with f(t) = 0 and therefore a constant jacobian. > > Thank you in advance. > > All the best, > Niclas > > From stefano.zampini at gmail.com Wed Aug 9 10:51:16 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 9 Aug 2023 17:51:16 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Message-ID: TSRK is an explicit solver. Unless you are changing the ts type from command line, the explicit jacobian should not be needed. On top of Barry's suggestion, I would suggest you to write the explicit RHS instead of assembly a throw away matrix every time that function needs to be sampled. On Wed, Aug 9, 2023, 17:09 Niclas G?tting wrote: > Hi all, > > I'm currently trying to convert a quantum simulation from scipy to > PETSc. The problem itself is extremely simple and of the form \dot{u}(t) > = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is > a square function. The matrices A_const and B_const are extremely sparse > and therefore I thought, the problem will be well suited for PETSc. > Currently, I solve the ODE with the following procedure in scipy (I can > provide the necessary data files, if needed, but they are just some > trace-preserving, very sparse matrices): > > import numpy as np > import scipy.sparse > import scipy.integrate > > from tqdm import tqdm > > > l = np.load("../liouvillian.npy") > pump = np.load("../pump_operator.npy") > state = np.load("../initial_state.npy") > > l = scipy.sparse.csr_array(l) > pump = scipy.sparse.csr_array(pump) > > def f(t, y, *args): > return (l + 0.5 * (5 < t < 10) * pump) @ y > #return l @ y # Uncomment for f(t) = 0 > > dt = 0.1 > NUM_STEPS = 200 > res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) > solver = > scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) > times = [] > for i in tqdm(range(NUM_STEPS)): > res[i, :] = solver.integrate(solver.t + dt) > times.append(solver.t) > > Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports > about 330it/s on my machine. When converting the code to PETSc, I came > to the following result (according to the chapter > https://petsc.org/main/manual/ts/#special-cases) > > import sys > import petsc4py > petsc4py.init(args=sys.argv) > import numpy as np > import scipy.sparse > > from tqdm import tqdm > from petsc4py import PETSc > > comm = PETSc.COMM_WORLD > > > def mat_to_real(arr): > return np.block([[arr.real, -arr.imag], [arr.imag, > arr.real]]).astype(np.float64) > > def mat_to_petsc_aij(arr): > arr_sc_sp = scipy.sparse.csr_array(arr) > mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) > rstart, rend = mat.getOwnershipRange() > print(rstart, rend) > print(arr.shape[0]) > print(mat.sizes) > I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] > J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : > arr_sc_sp.indptr[rend]] > V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] > > print(I.shape, J.shape, V.shape) > mat.setValuesCSR(I, J, V) > mat.assemble() > return mat > > > l = np.load("../liouvillian.npy") > l = mat_to_real(l) > pump = np.load("../pump_operator.npy") > pump = mat_to_real(pump) > state = np.load("../initial_state.npy") > state = np.hstack([state.real, state.imag]).astype(np.float64) > > l = mat_to_petsc_aij(l) > pump = mat_to_petsc_aij(pump) > > > jac = l.duplicate() > for i in range(8192): > jac.setValue(i, i, 0) > jac.assemble() > jac += l > > vec = l.createVecRight() > vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) > vec.assemble() > > > dt = 0.1 > > ts = PETSc.TS().create(comm=comm) > ts.setFromOptions() > ts.setProblemType(ts.ProblemType.LINEAR) > ts.setEquationType(ts.EquationType.ODE_EXPLICIT) > ts.setType(ts.Type.RK) > ts.setRKType(ts.RKType.RK3BS) > ts.setTime(0) > print("KSP:", ts.getKSP().getType()) > print("KSP PC:",ts.getKSP().getPC().getType()) > print("SNES :", ts.getSNES().getType()) > > def jacobian(ts, t, u, Amat, Pmat): > Amat.zeroEntries() > Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) > Amat.axpy(0.5 * (5 < t < 10), pump, > structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) > > ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) > #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # > Uncomment for f(t) = 0 > ts.setRHSJacobian(jacobian, jac) > > NUM_STEPS = 200 > res = np.empty((NUM_STEPS, 8192), dtype=np.float64) > times = [] > rstart, rend = vec.getOwnershipRange() > for i in tqdm(range(NUM_STEPS)): > time = ts.getTime() > ts.setMaxTime(time + dt) > ts.solve(vec) > res[i, rstart:rend] = vec.getArray()[:] > times.append(time) > > I decomposed the complex ODE into a larger real ODE, so that I can > easily switch maybe to GPU computation later on. Now, the solutions of > both scripts are very much identical, but PETSc runs about 3 times > slower at 120it/s on my machine. I don't use MPI for PETSc yet. > > I strongly suppose that the problem lies within the jacobian definition, > as PETSc is about 3 times *faster* than scipy with f(t) = 0 and > therefore a constant jacobian. > > Thank you in advance. > > All the best, > Niclas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 9 10:58:22 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 9 Aug 2023 11:58:22 -0400 Subject: [petsc-users] DMPlex bug in src/snes/tutorials/ex17.c In-Reply-To: References: Message-ID: On Wed, Aug 9, 2023 at 11:09?AM Ilya Fursov wrote: > Hello, > > I have a problem running src/snes/tutorials/ex17.c in parallel, > given the specific runtime options (these options are actually taken from > the test example ex17_3d_q3_trig_elas). > > *The serial version works fine:* > ./ex17 -dm_plex_box_faces 1,1,1 -sol_type elas_trig -dm_plex_dim 3 > -dm_plex_simplex 0 -displacement_petscspace_degree 3 -dm_refine 0 > -convest_num_refine 1 -snes_convergence_estimate -snes_monitor > > *The parallel version fails:* > mpirun -n 2 ./ex17 -dm_plex_box_faces 1,1,1 -sol_type elas_trig > -dm_plex_dim 3 -dm_plex_simplex 0 -displacement_petscspace_degree 3 > -dm_refine 0 -convest_num_refine 1 -snes_convergence_estimate -snes_monitor > *with the error message (--with-debugging=1):* > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: Cell 0 has not been assigned a cell type > Yes, I need to put some more checks for corner cases. You have a 1 element mesh on 2 processes, so there is no cell 0, but I am expecting at least one cell during setup. Give -dm_refine 1 or -dm_plex_box_faces 2,2,2 or something like that. Thanks, Matt > [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could > be the program crashed before they were used or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-convest_num_refine value: 1 source: > command line > [0]PETSC ERROR: Option left: name:-displacement_petscspace_degree value: > 3 source: command line > [0]PETSC ERROR: Option left: name:-snes_convergence_estimate (no value) > source: command line > [0]PETSC ERROR: Option left: name:-snes_monitor (no value) source: > command line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown > [0]PETSC ERROR: ./ex17 on a DEBUG named ilya-HP-Notebook by ilya Wed Aug > 9 19:23:02 2023 > [0]PETSC ERROR: Configure options PETSC_ARCH=DEBUG > PETSC_DIR=/home/ilya/build/petsc-09aug-debug/petsc > --with-blaslapack-dir=/home/ilya/progs/OpenBLAS-0.3.21 > --with-mpi-dir=/home/ilya/progs/openmpi-4.1.4 --with-debugging=1 > --download-hdf5 --download-hypre --download-chaco --download-metis > --download-parmetis --download-suitesparse --download-moab --download-mumps > --download-scalapack --download-superlu --download-superlu_dist > --download-triangle --download-ml --download-giflib --download-libjpeg > --download-libpng --download-zlib --download-spai --download-tchem > --download-party --download-cmake --download-hwloc --download-ptscotch > --download-revolve --download-cams --download-spai > [0]PETSC ERROR: #1 DMPlexGetCellType() at > /home/ilya/build/petsc-09aug-debug/petsc/src/dm/impls/plex/plex.c:5169 > [0]PETSC ERROR: #2 SetupFE() at ex17.c:621 > [0]PETSC ERROR: #3 main() at ex17.c:654 > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -convest_num_refine 1 (source: command line) > [0]PETSC ERROR: -displacement_petscspace_degree 3 (source: command line) > [0]PETSC ERROR: -dm_plex_box_faces 1,1,1 (source: command line) > [0]PETSC ERROR: -dm_plex_dim 3 (source: command line) > [0]PETSC ERROR: -dm_plex_simplex 0 (source: command line) > [0]PETSC ERROR: -dm_refine 0 (source: command line) > [0]PETSC ERROR: -snes_convergence_estimate (source: command line) > [0]PETSC ERROR: -snes_monitor (source: command line) > [0]PETSC ERROR: -sol_type elas_trig (source: command line) > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > *And when --with-debugging=0, it fails with segmentation violation:* > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is > causing the crash. > > Regards, > Ilya > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kramer at imperial.ac.uk Wed Aug 9 12:06:23 2023 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Wed, 9 Aug 2023 18:06:23 +0100 Subject: [petsc-users] performance regression with GAMG Message-ID: Dear petsc devs We have noticed a performance regression using GAMG as the preconditioner to solve the velocity block in a Stokes equations saddle point system with variable viscosity solved on a 3D hexahedral mesh of a spherical shell using Q2-Q1 elements. This is comparing performance from the beginning of last year (petsc 3.16.4) and a more recent petsc master (from around May this year). This is the weak scaling analysis we published in https://doi.org/10.5194/gmd-15-5127-2022 Previously the number of iterations for the velocity block (inner solve of the Schur complement) starts at 40 iterations (https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png) and only slowly going for larger problems (+more cores). Now the number of iterations now starts at 60 (https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png), same tolerances, again slowly going up with increasing size, with the cost per iteration also gone up (slightly) - resulting in an increased runtime of > 50%. The main change we can see is that the coarsening seems to have gotten a lot less aggressive at the first coarsening stage (finest->to one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change? The performance issues might be similar to https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html ? As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the older petsc version we had: ????????? rows=126, cols=126, bs=6 ????????? total: nonzeros=15876, allocated nonzeros=15876 -- ????????? rows=3072, cols=3072, bs=6 ????????? total: nonzeros=3344688, allocated nonzeros=3344688 -- ????????? rows=91152, cols=91152, bs=6 ????????? total: nonzeros=109729584, allocated nonzeros=109729584 -- ????????? rows=2655378, cols=2655378, bs=6 ????????? total: nonzeros=1468980252, allocated nonzeros=1468980252 -- ????????? rows=152175366, cols=152175366, bs=3 ????????? total: nonzeros=29047661586, allocated nonzeros=29047661586 Whereas with the newer version we get: ????????? rows=420, cols=420, bs=6 ????????? total: nonzeros=176400, allocated nonzeros=176400 -- ????????? rows=6462, cols=6462, bs=6 ????????? total: nonzeros=10891908, allocated nonzeros=10891908 -- ????????? rows=91716, cols=91716, bs=6 ????????? total: nonzeros=81687384, allocated nonzeros=81687384 -- ????????? rows=5419362, cols=5419362, bs=6 ????????? total: nonzeros=3668190588, allocated nonzeros=3668190588 -- ????????? rows=152175366, cols=152175366, bs=3 ????????? total: nonzeros=29047661586, allocated nonzeros=29047661586 So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to 2.6e6 DOFs. Note that we are providing the rigid body near nullspace, hence the bs=3 to bs=6. We have tried different values for the gamg_threshold but it doesn't really seem to significantly alter the coarsening amount in that first step. Do you have any suggestions for further things we should try/look at? Any feedback would be much appreciated Best wishes Stephan Kramer Full logs including log_view timings available from https://github.com/stephankramer/petsc-scaling/ In particular: https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat From cho at slac.stanford.edu Wed Aug 9 16:59:49 2023 From: cho at slac.stanford.edu (Ng, Cho-Kuen) Date: Wed, 9 Aug 2023 21:59:49 +0000 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> Message-ID: Barry and Matt, Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? Best, Cho ________________________________ From: Barry Smith Sent: Monday, July 17, 2023 6:58 AM To: Ng, Cho-Kuen Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Using PETSc GPU backend The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: Barry, Thank you so much for the clarification. I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? Cho ________________________________ From: Barry Smith > Sent: Saturday, July 15, 2023 8:36 AM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Cho, We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda Sorry for the confusion. Barry On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: Matt, After inserting 2 lines in the code: ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); ierr = MatSetFromOptions(A);CHKERRQ(ierr); ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); "There are no unused options." However, there is no improvement on the GPU performance. 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). THanks, Matt Thanks, Cho ________________________________ From: Matthew Knepley > Sent: Friday, July 14, 2023 5:57 PM To: Ng, Cho-Kuen > Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: I managed to pass the following options to PETSc using a GPU node on Perlmutter. -mat_type aijcusparse -vec_type cuda -log_view -options_left Below is a summary of the test using 4 MPI tasks and 1 GPU per task. o #PETSc Option Table entries: ???-log_view ???-mat_type aijcusparse -options_left -vec_type cuda #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-mat_type value: aijcusparse The -mat_type option has not been used. In the application code, we use ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. Thanks, Matt o The percent flops on the GPU for KSPSolve is 17%. In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? Thanks, Cho ________________________________ From: Ng, Cho-Kuen > Sent: Friday, June 30, 2023 7:57 AM To: Barry Smith >; Mark Adams > Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Barry, Mark and Matt, Thank you all for the suggestions. I will modify the code so we can pass runtime options. Cho ________________________________ From: Barry Smith > Sent: Friday, June 30, 2023 7:01 AM To: Mark Adams > Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left Mark On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: Mark, The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. Thanks, Matt Cho ________________________________ From: Ng, Cho-Kuen > Sent: Thursday, June 29, 2023 8:32 PM To: Mark Adams > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Mark, Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? Cho ________________________________ From: Mark Adams > Sent: Thursday, June 29, 2023 5:55 PM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. Mark On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? Thanks, Cho -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ ________________________________ From: Barry Smith Sent: Monday, July 17, 2023 6:58 AM To: Ng, Cho-Kuen Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Using PETSc GPU backend The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: Barry, Thank you so much for the clarification. I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? Cho ________________________________ From: Barry Smith > Sent: Saturday, July 15, 2023 8:36 AM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Cho, We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda Sorry for the confusion. Barry On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: Matt, After inserting 2 lines in the code: ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); ierr = MatSetFromOptions(A);CHKERRQ(ierr); ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); "There are no unused options." However, there is no improvement on the GPU performance. 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). THanks, Matt Thanks, Cho ________________________________ From: Matthew Knepley > Sent: Friday, July 14, 2023 5:57 PM To: Ng, Cho-Kuen > Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: I managed to pass the following options to PETSc using a GPU node on Perlmutter. -mat_type aijcusparse -vec_type cuda -log_view -options_left Below is a summary of the test using 4 MPI tasks and 1 GPU per task. o #PETSc Option Table entries: ???-log_view ???-mat_type aijcusparse -options_left -vec_type cuda #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-mat_type value: aijcusparse The -mat_type option has not been used. In the application code, we use ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. Thanks, Matt o The percent flops on the GPU for KSPSolve is 17%. In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? Thanks, Cho ________________________________ From: Ng, Cho-Kuen > Sent: Friday, June 30, 2023 7:57 AM To: Barry Smith >; Mark Adams > Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Barry, Mark and Matt, Thank you all for the suggestions. I will modify the code so we can pass runtime options. Cho ________________________________ From: Barry Smith > Sent: Friday, June 30, 2023 7:01 AM To: Mark Adams > Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left Mark On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: Mark, The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. Thanks, Matt Cho ________________________________ From: Ng, Cho-Kuen > Sent: Thursday, June 29, 2023 8:32 PM To: Mark Adams > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Mark, Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? Cho ________________________________ From: Mark Adams > Sent: Thursday, June 29, 2023 5:55 PM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. Mark On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? Thanks, Cho -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 9 18:09:46 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 9 Aug 2023 19:09:46 -0400 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> Message-ID: <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen wrote: > > Barry and Matt, > > Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? > > Best, > Cho > > From: Barry Smith > > Sent: Monday, July 17, 2023 6:58 AM > To: Ng, Cho-Kuen > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Using PETSc GPU backend > > > The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > > > >> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >> >> Barry, >> >> Thank you so much for the clarification. >> >> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >> >> Cho >> From: Barry Smith > >> Sent: Saturday, July 15, 2023 8:36 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> >> >> Cho, >> >> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >> >> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >> >> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >> >> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >> >> Sorry for the confusion. >> >> Barry >> >> >> >> >>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>> >>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>> Matt, >>> >>> After inserting 2 lines in the code: >>> >>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>> >>> "There are no unused options." However, there is no improvement on the GPU performance. >>> >>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>> >>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>> >>> THanks, >>> >>> Matt >>> >>> Thanks, >>> Cho >>> >>> From: Matthew Knepley > >>> Sent: Friday, July 14, 2023 5:57 PM >>> To: Ng, Cho-Kuen > >>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>> >>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>> >>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>> >>> o #PETSc Option Table entries: >>> ???-log_view >>> ???-mat_type aijcusparse >>> -options_left >>> -vec_type cuda >>> #End of PETSc Option Table entries >>> WARNING! There are options you set that were not used! >>> WARNING! could be spelling mistake, etc! >>> There is one unused database option. It is: >>> Option left: name:-mat_type value: aijcusparse >>> >>> The -mat_type option has not been used. In the application code, we use >>> >>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>> >>> >>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>> >>> Thanks, >>> >>> Matt >>> >>> o The percent flops on the GPU for KSPSolve is 17%. >>> >>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>> >>> Thanks, >>> Cho >>> From: Ng, Cho-Kuen > >>> Sent: Friday, June 30, 2023 7:57 AM >>> To: Barry Smith >; Mark Adams > >>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Barry, Mark and Matt, >>> >>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>> >>> Cho >>> From: Barry Smith > >>> Sent: Friday, June 30, 2023 7:01 AM >>> To: Mark Adams > >>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> >>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>> >>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>> >>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>> >>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Mark >>>> >>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>> Mark, >>>> >>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>> >>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Thursday, June 29, 2023 8:32 PM >>>> To: Mark Adams > >>>> Cc: petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Mark, >>>> >>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>> >>>> Cho >>>> From: Mark Adams > >>>> Sent: Thursday, June 29, 2023 5:55 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>> >>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>> >>>> Mark >>>> >>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>> >>>> Thanks, >>>> Cho >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > > From: Barry Smith > > Sent: Monday, July 17, 2023 6:58 AM > To: Ng, Cho-Kuen > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] Using PETSc GPU backend > > > The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda > > > >> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: >> >> Barry, >> >> Thank you so much for the clarification. >> >> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >> >> Cho >> From: Barry Smith > >> Sent: Saturday, July 15, 2023 8:36 AM >> To: Ng, Cho-Kuen > >> Cc: petsc-users at mcs.anl.gov > >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> >> >> Cho, >> >> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >> >> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >> >> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >> >> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >> >> Sorry for the confusion. >> >> Barry >> >> >> >> >>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: >>> >>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: >>> Matt, >>> >>> After inserting 2 lines in the code: >>> >>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>> >>> "There are no unused options." However, there is no improvement on the GPU performance. >>> >>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>> >>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>> >>> THanks, >>> >>> Matt >>> >>> Thanks, >>> Cho >>> >>> From: Matthew Knepley > >>> Sent: Friday, July 14, 2023 5:57 PM >>> To: Ng, Cho-Kuen > >>> Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: >>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>> >>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>> >>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>> >>> o #PETSc Option Table entries: >>> ???-log_view >>> ???-mat_type aijcusparse >>> -options_left >>> -vec_type cuda >>> #End of PETSc Option Table entries >>> WARNING! There are options you set that were not used! >>> WARNING! could be spelling mistake, etc! >>> There is one unused database option. It is: >>> Option left: name:-mat_type value: aijcusparse >>> >>> The -mat_type option has not been used. In the application code, we use >>> >>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>> >>> >>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>> >>> Thanks, >>> >>> Matt >>> >>> o The percent flops on the GPU for KSPSolve is 17%. >>> >>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>> >>> Thanks, >>> Cho >>> From: Ng, Cho-Kuen > >>> Sent: Friday, June 30, 2023 7:57 AM >>> To: Barry Smith >; Mark Adams > >>> Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Barry, Mark and Matt, >>> >>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>> >>> Cho >>> From: Barry Smith > >>> Sent: Friday, June 30, 2023 7:01 AM >>> To: Mark Adams > >>> Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> >>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>> >>>> On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: >>>> >>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>> >>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Mark >>>> >>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: >>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: >>>> Mark, >>>> >>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>> >>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Cho >>>> From: Ng, Cho-Kuen > >>>> Sent: Thursday, June 29, 2023 8:32 PM >>>> To: Mark Adams > >>>> Cc: petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Mark, >>>> >>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>> >>>> Cho >>>> From: Mark Adams > >>>> Sent: Thursday, June 29, 2023 5:55 PM >>>> To: Ng, Cho-Kuen > >>>> Cc: petsc-users at mcs.anl.gov > >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>> >>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>> >>>> Mark >>>> >>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: >>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>> >>>> Thanks, >>>> Cho >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Thu Aug 10 01:29:39 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Thu, 10 Aug 2023 09:29:39 +0300 Subject: [petsc-users] error related to 'valgrind' when using MatView Message-ID: I am unable to understand what possibly went wrong with my code, I could load a matrix (large sparse matrix) into petsc, write it out and read it back into Matlab but when I tried to use MatView to see the matrix-info, it produces error of some 'corrupt argument, #valgrind'. Can anyone please help? Maitri -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Mat Object: 1 MPI process type: mpiaij rows=480000, cols=480000 total: nonzeros=17795897, allocated nonzeros=17795897 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines after Loading A matrix... Memory: Total: 0.483803 Max: 0.483803 [0] after MatDestroy... Memory: Total: 0.483803 Max: 0.483803 [0] Done [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Corrupt argument: https://petsc.org/release/faq/#valgrind [0]PETSC ERROR: Object already free: Parameter # 1 [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-889-g617fea4 GIT Date: 2023-08-01 05:10:03 +0000 [0]PETSC ERROR: ./loadMat on a linux-gnu-c-debug named zeus.technion.ac.il by maitri.ksh Thu Aug 10 09:20:43 2023 [0]PETSC ERROR: Configure options --with-cc=/usr/local/gcc11/bin/gcc --with-cxx=/usr/local/gcc11/bin/g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-matlab --with-matlab-dir=/usr/local/matlab --download-superlu --with-superlu --download-superlu_dist --with-superlu_dist --download-hdf5 --with-hdf5=1 --download-mumps --with-mumps --download-scalapack --with-scalapack --download-parmetis --download-metis --download-ptscotch --download-bison --download-cmake [0]PETSC ERROR: #1 PetscObjectDestroy() at /home/maitri.ksh/Maitri/petsc/src/sys/objects/destroy.c:50 [0]PETSC ERROR: #2 PetscObjectRegisterDestroyAll() at /home/maitri.ksh/Maitri/petsc/src/sys/objects/destroy.c:355 [0]PETSC ERROR: #3 PetscFinalize() at /home/maitri.ksh/Maitri/petsc/src/sys/objects/pinit.c:1452 -------------- next part -------------- /* Load unfactored matrix (A) */ PetscViewer viewerA; ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD, "Jmat.dat", FILE_MODE_READ, &viewerA); CHKERRQ(ierr); ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); ierr = MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, N, N); ierr = MatSetType(A, type); ierr = MatSetFromOptions(A); PetscPrintf(PETSC_COMM_WORLD, "Loading unfactored matrix\n"); ierr = MatLoad(A, viewerA); CHKERRQ(ierr); PetscViewerDestroy(&viewerA); PetscViewer viewer; viewer = PETSC_VIEWER_STDOUT_WORLD; ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_INFO); ierr = MatView(A, viewer); PetscViewerPopFormat(viewer); PetscViewerDestroy(&viewer); From ngoetting at itp.uni-bremen.de Thu Aug 10 04:40:04 2023 From: ngoetting at itp.uni-bremen.de (=?UTF-8?Q?Niclas_G=C3=B6tting?=) Date: Thu, 10 Aug 2023 11:40:04 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Message-ID: Thank you both for the very quick answer! So far, I compiled PETSc with debugging turned on, but I think it should still be faster than standard scipy in both cases. Actually, Stefano's answer has got me very far already; now I only define the RHS of the ODE and no Jacobian (I wonder, why the documentation suggests otherwise, though). I had the following four tries at implementing the RHS: 1. def rhsfunc1(ts, t, u, F): ??? scale = 0.5 * (5 < t < 10) ??? (l + scale * pump).mult(u, F) 2. def rhsfunc2(ts, t, u, F): ??? l.mult(u, F) ??? scale = 0.5 * (5 < t < 10) ??? (scale * pump).multAdd(u, F, F) 3. def rhsfunc3(ts, t, u, F): ??? l.mult(u, F) ??? scale = 0.5 * (5 < t < 10) ??? if scale != 0: ??????? pump.scale(scale) ??????? pump.multAdd(u, F, F) ??????? pump.scale(1/scale) 4. def rhsfunc4(ts, t, u, F): ??? tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() ??? l.mult(u, F) ??? scale = 0.5 * (5 < t < 10) ??? tmp_pump.axpy(scale, pump, structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) ??? tmp_pump.multAdd(u, F, F) They all yield the same results, but with 50it/s, 800it/, 2300it/s and 1900it/s, respectively, which is a huge performance boost (almost 7 times as fast as scipy, with PETSc debugging still turned on). As the scale function will most likely be a gaussian in the future, I think that option 3 will be become numerically unstable and I'll have to go with option 4, which is already faster than I expected. If you think it is possible to speed up the RHS calculation even more, I'd be happy to hear your suggestions; the -log_view is attached to this message. One last point: If I didn't misunderstand the documentation at https://petsc.org/release/manual/ts/#special-cases, should this maybe be changed? Best regards Niclas On 09.08.23 17:51, Stefano Zampini wrote: > TSRK is an explicit solver. Unless you are changing the ts type from > command line,? the explicit? jacobian should not be needed. On top of > Barry's suggestion, I would suggest you to write the explicit RHS > instead of assembly a throw away matrix every time that function needs > to be sampled. > > On Wed, Aug 9, 2023, 17:09 Niclas G?tting > wrote: > > Hi all, > > I'm currently trying to convert a quantum simulation from scipy to > PETSc. The problem itself is extremely simple and of the form > \dot{u}(t) > = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test > case is > a square function. The matrices A_const and B_const are extremely > sparse > and therefore I thought, the problem will be well suited for PETSc. > Currently, I solve the ODE with the following procedure in scipy > (I can > provide the necessary data files, if needed, but they are just some > trace-preserving, very sparse matrices): > > import numpy as np > import scipy.sparse > import scipy.integrate > > from tqdm import tqdm > > > l = np.load("../liouvillian.npy") > pump = np.load("../pump_operator.npy") > state = np.load("../initial_state.npy") > > l = scipy.sparse.csr_array(l) > pump = scipy.sparse.csr_array(pump) > > def f(t, y, *args): > ???? return (l + 0.5 * (5 < t < 10) * pump) @ y > ???? #return l @ y # Uncomment for f(t) = 0 > > dt = 0.1 > NUM_STEPS = 200 > res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) > solver = > scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) > times = [] > for i in tqdm(range(NUM_STEPS)): > ???? res[i, :] = solver.integrate(solver.t + dt) > ???? times.append(solver.t) > > Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports > about 330it/s on my machine. When converting the code to PETSc, I > came > to the following result (according to the chapter > https://petsc.org/main/manual/ts/#special-cases) > > import sys > import petsc4py > petsc4py.init(args=sys.argv) > import numpy as np > import scipy.sparse > > from tqdm import tqdm > from petsc4py import PETSc > > comm = PETSc.COMM_WORLD > > > def mat_to_real(arr): > ???? return np.block([[arr.real, -arr.imag], [arr.imag, > arr.real]]).astype(np.float64) > > def mat_to_petsc_aij(arr): > ???? arr_sc_sp = scipy.sparse.csr_array(arr) > ???? mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) > ???? rstart, rend = mat.getOwnershipRange() > ???? print(rstart, rend) > ???? print(arr.shape[0]) > ???? print(mat.sizes) > ???? I = arr_sc_sp.indptr[rstart : rend + 1] - > arr_sc_sp.indptr[rstart] > ???? J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : > arr_sc_sp.indptr[rend]] > ???? V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : > arr_sc_sp.indptr[rend]] > > ???? print(I.shape, J.shape, V.shape) > ???? mat.setValuesCSR(I, J, V) > ???? mat.assemble() > ???? return mat > > > l = np.load("../liouvillian.npy") > l = mat_to_real(l) > pump = np.load("../pump_operator.npy") > pump = mat_to_real(pump) > state = np.load("../initial_state.npy") > state = np.hstack([state.real, state.imag]).astype(np.float64) > > l = mat_to_petsc_aij(l) > pump = mat_to_petsc_aij(pump) > > > jac = l.duplicate() > for i in range(8192): > ???? jac.setValue(i, i, 0) > jac.assemble() > jac += l > > vec = l.createVecRight() > vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) > vec.assemble() > > > dt = 0.1 > > ts = PETSc.TS().create(comm=comm) > ts.setFromOptions() > ts.setProblemType(ts.ProblemType.LINEAR) > ts.setEquationType(ts.EquationType.ODE_EXPLICIT) > ts.setType(ts.Type.RK) > ts.setRKType(ts.RKType.RK3BS) > ts.setTime(0) > print("KSP:", ts.getKSP().getType()) > print("KSP PC:",ts.getKSP().getPC().getType()) > print("SNES :", ts.getSNES().getType()) > > def jacobian(ts, t, u, Amat, Pmat): > ???? Amat.zeroEntries() > ???? Amat.aypx(1, l, > structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) > ???? Amat.axpy(0.5 * (5 < t < 10), pump, > structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) > > ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) > #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # > Uncomment for f(t) = 0 > ts.setRHSJacobian(jacobian, jac) > > NUM_STEPS = 200 > res = np.empty((NUM_STEPS, 8192), dtype=np.float64) > times = [] > rstart, rend = vec.getOwnershipRange() > for i in tqdm(range(NUM_STEPS)): > ???? time = ts.getTime() > ???? ts.setMaxTime(time + dt) > ???? ts.solve(vec) > ???? res[i, rstart:rend] = vec.getArray()[:] > ???? times.append(time) > > I decomposed the complex ODE into a larger real ODE, so that I can > easily switch maybe to GPU computation later on. Now, the > solutions of > both scripts are very much identical, but PETSc runs about 3 times > slower at 120it/s on my machine. I don't use MPI for PETSc yet. > > I strongly suppose that the problem lies within the jacobian > definition, > as PETSc is about 3 times *faster* than scipy with f(t) = 0 and > therefore a constant jacobian. > > Thank you in advance. > > All the best, > Niclas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log.log Type: text/x-log Size: 10183 bytes Desc: not available URL: From stefano.zampini at gmail.com Thu Aug 10 04:47:12 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 10 Aug 2023 11:47:12 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Message-ID: I would use option 3. Keep a work vector and do a vector summation instead of the multiple multiplication by scale and 1/scale. I agree with you the docs are a little misleading here. On Thu, Aug 10, 2023, 11:40 Niclas G?tting wrote: > Thank you both for the very quick answer! > > So far, I compiled PETSc with debugging turned on, but I think it should > still be faster than standard scipy in both cases. Actually, Stefano's > answer has got me very far already; now I only define the RHS of the ODE > and no Jacobian (I wonder, why the documentation suggests otherwise, > though). I had the following four tries at implementing the RHS: > > 1. def rhsfunc1(ts, t, u, F): > scale = 0.5 * (5 < t < 10) > (l + scale * pump).mult(u, F) > 2. def rhsfunc2(ts, t, u, F): > l.mult(u, F) > scale = 0.5 * (5 < t < 10) > (scale * pump).multAdd(u, F, F) > 3. def rhsfunc3(ts, t, u, F): > l.mult(u, F) > scale = 0.5 * (5 < t < 10) > if scale != 0: > pump.scale(scale) > pump.multAdd(u, F, F) > pump.scale(1/scale) > 4. def rhsfunc4(ts, t, u, F): > tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() > l.mult(u, F) > scale = 0.5 * (5 < t < 10) > tmp_pump.axpy(scale, pump, > structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) > tmp_pump.multAdd(u, F, F) > > They all yield the same results, but with 50it/s, 800it/, 2300it/s and > 1900it/s, respectively, which is a huge performance boost (almost 7 times > as fast as scipy, with PETSc debugging still turned on). As the scale > function will most likely be a gaussian in the future, I think that option > 3 will be become numerically unstable and I'll have to go with option 4, > which is already faster than I expected. If you think it is possible to > speed up the RHS calculation even more, I'd be happy to hear your > suggestions; the -log_view is attached to this message. > > One last point: If I didn't misunderstand the documentation at > https://petsc.org/release/manual/ts/#special-cases, should this maybe be > changed? > > Best regards > Niclas > On 09.08.23 17:51, Stefano Zampini wrote: > > TSRK is an explicit solver. Unless you are changing the ts type from > command line, the explicit jacobian should not be needed. On top of > Barry's suggestion, I would suggest you to write the explicit RHS instead > of assembly a throw away matrix every time that function needs to be > sampled. > > On Wed, Aug 9, 2023, 17:09 Niclas G?tting > wrote: > >> Hi all, >> >> I'm currently trying to convert a quantum simulation from scipy to >> PETSc. The problem itself is extremely simple and of the form \dot{u}(t) >> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is >> a square function. The matrices A_const and B_const are extremely sparse >> and therefore I thought, the problem will be well suited for PETSc. >> Currently, I solve the ODE with the following procedure in scipy (I can >> provide the necessary data files, if needed, but they are just some >> trace-preserving, very sparse matrices): >> >> import numpy as np >> import scipy.sparse >> import scipy.integrate >> >> from tqdm import tqdm >> >> >> l = np.load("../liouvillian.npy") >> pump = np.load("../pump_operator.npy") >> state = np.load("../initial_state.npy") >> >> l = scipy.sparse.csr_array(l) >> pump = scipy.sparse.csr_array(pump) >> >> def f(t, y, *args): >> return (l + 0.5 * (5 < t < 10) * pump) @ y >> #return l @ y # Uncomment for f(t) = 0 >> >> dt = 0.1 >> NUM_STEPS = 200 >> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) >> solver = >> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) >> times = [] >> for i in tqdm(range(NUM_STEPS)): >> res[i, :] = solver.integrate(solver.t + dt) >> times.append(solver.t) >> >> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports >> about 330it/s on my machine. When converting the code to PETSc, I came >> to the following result (according to the chapter >> https://petsc.org/main/manual/ts/#special-cases) >> >> import sys >> import petsc4py >> petsc4py.init(args=sys.argv) >> import numpy as np >> import scipy.sparse >> >> from tqdm import tqdm >> from petsc4py import PETSc >> >> comm = PETSc.COMM_WORLD >> >> >> def mat_to_real(arr): >> return np.block([[arr.real, -arr.imag], [arr.imag, >> arr.real]]).astype(np.float64) >> >> def mat_to_petsc_aij(arr): >> arr_sc_sp = scipy.sparse.csr_array(arr) >> mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) >> rstart, rend = mat.getOwnershipRange() >> print(rstart, rend) >> print(arr.shape[0]) >> print(mat.sizes) >> I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] >> J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : >> arr_sc_sp.indptr[rend]] >> V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] >> >> print(I.shape, J.shape, V.shape) >> mat.setValuesCSR(I, J, V) >> mat.assemble() >> return mat >> >> >> l = np.load("../liouvillian.npy") >> l = mat_to_real(l) >> pump = np.load("../pump_operator.npy") >> pump = mat_to_real(pump) >> state = np.load("../initial_state.npy") >> state = np.hstack([state.real, state.imag]).astype(np.float64) >> >> l = mat_to_petsc_aij(l) >> pump = mat_to_petsc_aij(pump) >> >> >> jac = l.duplicate() >> for i in range(8192): >> jac.setValue(i, i, 0) >> jac.assemble() >> jac += l >> >> vec = l.createVecRight() >> vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) >> vec.assemble() >> >> >> dt = 0.1 >> >> ts = PETSc.TS().create(comm=comm) >> ts.setFromOptions() >> ts.setProblemType(ts.ProblemType.LINEAR) >> ts.setEquationType(ts.EquationType.ODE_EXPLICIT) >> ts.setType(ts.Type.RK) >> ts.setRKType(ts.RKType.RK3BS) >> ts.setTime(0) >> print("KSP:", ts.getKSP().getType()) >> print("KSP PC:",ts.getKSP().getPC().getType()) >> print("SNES :", ts.getSNES().getType()) >> >> def jacobian(ts, t, u, Amat, Pmat): >> Amat.zeroEntries() >> Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >> Amat.axpy(0.5 * (5 < t < 10), pump, >> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >> >> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) >> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # >> Uncomment for f(t) = 0 >> ts.setRHSJacobian(jacobian, jac) >> >> NUM_STEPS = 200 >> res = np.empty((NUM_STEPS, 8192), dtype=np.float64) >> times = [] >> rstart, rend = vec.getOwnershipRange() >> for i in tqdm(range(NUM_STEPS)): >> time = ts.getTime() >> ts.setMaxTime(time + dt) >> ts.solve(vec) >> res[i, rstart:rend] = vec.getArray()[:] >> times.append(time) >> >> I decomposed the complex ODE into a larger real ODE, so that I can >> easily switch maybe to GPU computation later on. Now, the solutions of >> both scripts are very much identical, but PETSc runs about 3 times >> slower at 120it/s on my machine. I don't use MPI for PETSc yet. >> >> I strongly suppose that the problem lies within the jacobian definition, >> as PETSc is about 3 times *faster* than scipy with f(t) = 0 and >> therefore a constant jacobian. >> >> Thank you in advance. >> >> All the best, >> Niclas >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngoetting at itp.uni-bremen.de Thu Aug 10 05:12:47 2023 From: ngoetting at itp.uni-bremen.de (=?UTF-8?Q?Niclas_G=C3=B6tting?=) Date: Thu, 10 Aug 2023 12:12:47 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Message-ID: <6aa15dfb-8071-4e2e-8ebd-dcfe3a687709@itp.uni-bremen.de> If I understood you right, this should be the resulting RHS: def rhsfunc5(ts, t, u, F): ??? l.mult(u, F) ??? pump.mult(u, tmp_vec) ??? scale = 0.5 * (5 < t < 10) ??? F.axpy(scale, tmp_vec) It is a little bit slower than option 3, but with about 2100it/s consistently ~10% faster than option 4. Thank you very much for the suggestion! On 10.08.23 11:47, Stefano Zampini wrote: > I would use option 3. Keep a work vector and do a vector summation > instead of the multiple multiplication by scale and 1/scale. > > I agree with you the docs are a little misleading here. > > On Thu, Aug 10, 2023, 11:40 Niclas G?tting > wrote: > > Thank you both for the very quick answer! > > So far, I compiled PETSc with debugging turned on, but I think it > should still be faster than standard scipy in both cases. > Actually, Stefano's answer has got me very far already; now I only > define the RHS of the ODE and no Jacobian (I wonder, why the > documentation suggests otherwise, though). I had the following > four tries at implementing the RHS: > > 1. def rhsfunc1(ts, t, u, F): > ??? scale = 0.5 * (5 < t < 10) > ??? (l + scale * pump).mult(u, F) > 2. def rhsfunc2(ts, t, u, F): > ??? l.mult(u, F) > ??? scale = 0.5 * (5 < t < 10) > ??? (scale * pump).multAdd(u, F, F) > 3. def rhsfunc3(ts, t, u, F): > ??? l.mult(u, F) > ??? scale = 0.5 * (5 < t < 10) > ??? if scale != 0: > ??????? pump.scale(scale) > ??????? pump.multAdd(u, F, F) > ??????? pump.scale(1/scale) > 4. def rhsfunc4(ts, t, u, F): > ??? tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() > ??? l.mult(u, F) > ??? scale = 0.5 * (5 < t < 10) > ??? tmp_pump.axpy(scale, pump, > structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) > ??? tmp_pump.multAdd(u, F, F) > > They all yield the same results, but with 50it/s, 800it/, 2300it/s > and 1900it/s, respectively, which is a huge performance boost > (almost 7 times as fast as scipy, with PETSc debugging still > turned on). As the scale function will most likely be a gaussian > in the future, I think that option 3 will be become numerically > unstable and I'll have to go with option 4, which is already > faster than I expected. If you think it is possible to speed up > the RHS calculation even more, I'd be happy to hear your > suggestions; the -log_view is attached to this message. > > One last point: If I didn't misunderstand the documentation at > https://petsc.org/release/manual/ts/#special-cases, should this > maybe be changed? > > Best regards > Niclas > > On 09.08.23 17:51, Stefano Zampini wrote: >> TSRK is an explicit solver. Unless you are changing the ts type >> from command line,? the explicit? jacobian should not be needed. >> On top of Barry's suggestion, I would suggest you to write the >> explicit RHS instead of assembly a throw away matrix every time >> that function needs to be sampled. >> >> On Wed, Aug 9, 2023, 17:09 Niclas G?tting >> wrote: >> >> Hi all, >> >> I'm currently trying to convert a quantum simulation from >> scipy to >> PETSc. The problem itself is extremely simple and of the form >> \dot{u}(t) >> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple >> test case is >> a square function. The matrices A_const and B_const are >> extremely sparse >> and therefore I thought, the problem will be well suited for >> PETSc. >> Currently, I solve the ODE with the following procedure in >> scipy (I can >> provide the necessary data files, if needed, but they are >> just some >> trace-preserving, very sparse matrices): >> >> import numpy as np >> import scipy.sparse >> import scipy.integrate >> >> from tqdm import tqdm >> >> >> l = np.load("../liouvillian.npy") >> pump = np.load("../pump_operator.npy") >> state = np.load("../initial_state.npy") >> >> l = scipy.sparse.csr_array(l) >> pump = scipy.sparse.csr_array(pump) >> >> def f(t, y, *args): >> ???? return (l + 0.5 * (5 < t < 10) * pump) @ y >> ???? #return l @ y # Uncomment for f(t) = 0 >> >> dt = 0.1 >> NUM_STEPS = 200 >> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) >> solver = >> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) >> times = [] >> for i in tqdm(range(NUM_STEPS)): >> ???? res[i, :] = solver.integrate(solver.t + dt) >> ???? times.append(solver.t) >> >> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm >> reports >> about 330it/s on my machine. When converting the code to >> PETSc, I came >> to the following result (according to the chapter >> https://petsc.org/main/manual/ts/#special-cases) >> >> import sys >> import petsc4py >> petsc4py.init(args=sys.argv) >> import numpy as np >> import scipy.sparse >> >> from tqdm import tqdm >> from petsc4py import PETSc >> >> comm = PETSc.COMM_WORLD >> >> >> def mat_to_real(arr): >> ???? return np.block([[arr.real, -arr.imag], [arr.imag, >> arr.real]]).astype(np.float64) >> >> def mat_to_petsc_aij(arr): >> ???? arr_sc_sp = scipy.sparse.csr_array(arr) >> ???? mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) >> ???? rstart, rend = mat.getOwnershipRange() >> ???? print(rstart, rend) >> ???? print(arr.shape[0]) >> ???? print(mat.sizes) >> ???? I = arr_sc_sp.indptr[rstart : rend + 1] - >> arr_sc_sp.indptr[rstart] >> ???? J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : >> arr_sc_sp.indptr[rend]] >> ???? V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : >> arr_sc_sp.indptr[rend]] >> >> ???? print(I.shape, J.shape, V.shape) >> ???? mat.setValuesCSR(I, J, V) >> ???? mat.assemble() >> ???? return mat >> >> >> l = np.load("../liouvillian.npy") >> l = mat_to_real(l) >> pump = np.load("../pump_operator.npy") >> pump = mat_to_real(pump) >> state = np.load("../initial_state.npy") >> state = np.hstack([state.real, state.imag]).astype(np.float64) >> >> l = mat_to_petsc_aij(l) >> pump = mat_to_petsc_aij(pump) >> >> >> jac = l.duplicate() >> for i in range(8192): >> ???? jac.setValue(i, i, 0) >> jac.assemble() >> jac += l >> >> vec = l.createVecRight() >> vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) >> vec.assemble() >> >> >> dt = 0.1 >> >> ts = PETSc.TS().create(comm=comm) >> ts.setFromOptions() >> ts.setProblemType(ts.ProblemType.LINEAR) >> ts.setEquationType(ts.EquationType.ODE_EXPLICIT) >> ts.setType(ts.Type.RK) >> ts.setRKType(ts.RKType.RK3BS) >> ts.setTime(0) >> print("KSP:", ts.getKSP().getType()) >> print("KSP PC:",ts.getKSP().getPC().getType()) >> print("SNES :", ts.getSNES().getType()) >> >> def jacobian(ts, t, u, Amat, Pmat): >> ???? Amat.zeroEntries() >> ???? Amat.aypx(1, l, >> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >> ???? Amat.axpy(0.5 * (5 < t < 10), pump, >> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >> >> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) >> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # >> Uncomment for f(t) = 0 >> ts.setRHSJacobian(jacobian, jac) >> >> NUM_STEPS = 200 >> res = np.empty((NUM_STEPS, 8192), dtype=np.float64) >> times = [] >> rstart, rend = vec.getOwnershipRange() >> for i in tqdm(range(NUM_STEPS)): >> ???? time = ts.getTime() >> ???? ts.setMaxTime(time + dt) >> ???? ts.solve(vec) >> ???? res[i, rstart:rend] = vec.getArray()[:] >> ???? times.append(time) >> >> I decomposed the complex ODE into a larger real ODE, so that >> I can >> easily switch maybe to GPU computation later on. Now, the >> solutions of >> both scripts are very much identical, but PETSc runs about 3 >> times >> slower at 120it/s on my machine. I don't use MPI for PETSc yet. >> >> I strongly suppose that the problem lies within the jacobian >> definition, >> as PETSc is about 3 times *faster* than scipy with f(t) = 0 and >> therefore a constant jacobian. >> >> Thank you in advance. >> >> All the best, >> Niclas >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Aug 10 05:16:08 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 10 Aug 2023 12:16:08 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: <6aa15dfb-8071-4e2e-8ebd-dcfe3a687709@itp.uni-bremen.de> References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> <6aa15dfb-8071-4e2e-8ebd-dcfe3a687709@itp.uni-bremen.de> Message-ID: If you do the mult of "pump" inside an if it should be faster On Thu, Aug 10, 2023, 12:12 Niclas G?tting wrote: > If I understood you right, this should be the resulting RHS: > > def rhsfunc5(ts, t, u, F): > l.mult(u, F) > pump.mult(u, tmp_vec) > scale = 0.5 * (5 < t < 10) > F.axpy(scale, tmp_vec) > > It is a little bit slower than option 3, but with about 2100it/s > consistently ~10% faster than option 4. > > Thank you very much for the suggestion! > On 10.08.23 11:47, Stefano Zampini wrote: > > I would use option 3. Keep a work vector and do a vector summation instead > of the multiple multiplication by scale and 1/scale. > > I agree with you the docs are a little misleading here. > > On Thu, Aug 10, 2023, 11:40 Niclas G?tting > wrote: > >> Thank you both for the very quick answer! >> >> So far, I compiled PETSc with debugging turned on, but I think it should >> still be faster than standard scipy in both cases. Actually, Stefano's >> answer has got me very far already; now I only define the RHS of the ODE >> and no Jacobian (I wonder, why the documentation suggests otherwise, >> though). I had the following four tries at implementing the RHS: >> >> 1. def rhsfunc1(ts, t, u, F): >> scale = 0.5 * (5 < t < 10) >> (l + scale * pump).mult(u, F) >> 2. def rhsfunc2(ts, t, u, F): >> l.mult(u, F) >> scale = 0.5 * (5 < t < 10) >> (scale * pump).multAdd(u, F, F) >> 3. def rhsfunc3(ts, t, u, F): >> l.mult(u, F) >> scale = 0.5 * (5 < t < 10) >> if scale != 0: >> pump.scale(scale) >> pump.multAdd(u, F, F) >> pump.scale(1/scale) >> 4. def rhsfunc4(ts, t, u, F): >> tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() >> l.mult(u, F) >> scale = 0.5 * (5 < t < 10) >> tmp_pump.axpy(scale, pump, >> structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) >> tmp_pump.multAdd(u, F, F) >> >> They all yield the same results, but with 50it/s, 800it/, 2300it/s and >> 1900it/s, respectively, which is a huge performance boost (almost 7 times >> as fast as scipy, with PETSc debugging still turned on). As the scale >> function will most likely be a gaussian in the future, I think that option >> 3 will be become numerically unstable and I'll have to go with option 4, >> which is already faster than I expected. If you think it is possible to >> speed up the RHS calculation even more, I'd be happy to hear your >> suggestions; the -log_view is attached to this message. >> >> One last point: If I didn't misunderstand the documentation at >> https://petsc.org/release/manual/ts/#special-cases, should this maybe be >> changed? >> >> Best regards >> Niclas >> On 09.08.23 17:51, Stefano Zampini wrote: >> >> TSRK is an explicit solver. Unless you are changing the ts type from >> command line, the explicit jacobian should not be needed. On top of >> Barry's suggestion, I would suggest you to write the explicit RHS instead >> of assembly a throw away matrix every time that function needs to be >> sampled. >> >> On Wed, Aug 9, 2023, 17:09 Niclas G?tting >> wrote: >> >>> Hi all, >>> >>> I'm currently trying to convert a quantum simulation from scipy to >>> PETSc. The problem itself is extremely simple and of the form \dot{u}(t) >>> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is >>> a square function. The matrices A_const and B_const are extremely sparse >>> and therefore I thought, the problem will be well suited for PETSc. >>> Currently, I solve the ODE with the following procedure in scipy (I can >>> provide the necessary data files, if needed, but they are just some >>> trace-preserving, very sparse matrices): >>> >>> import numpy as np >>> import scipy.sparse >>> import scipy.integrate >>> >>> from tqdm import tqdm >>> >>> >>> l = np.load("../liouvillian.npy") >>> pump = np.load("../pump_operator.npy") >>> state = np.load("../initial_state.npy") >>> >>> l = scipy.sparse.csr_array(l) >>> pump = scipy.sparse.csr_array(pump) >>> >>> def f(t, y, *args): >>> return (l + 0.5 * (5 < t < 10) * pump) @ y >>> #return l @ y # Uncomment for f(t) = 0 >>> >>> dt = 0.1 >>> NUM_STEPS = 200 >>> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) >>> solver = >>> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) >>> times = [] >>> for i in tqdm(range(NUM_STEPS)): >>> res[i, :] = solver.integrate(solver.t + dt) >>> times.append(solver.t) >>> >>> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports >>> about 330it/s on my machine. When converting the code to PETSc, I came >>> to the following result (according to the chapter >>> https://petsc.org/main/manual/ts/#special-cases) >>> >>> import sys >>> import petsc4py >>> petsc4py.init(args=sys.argv) >>> import numpy as np >>> import scipy.sparse >>> >>> from tqdm import tqdm >>> from petsc4py import PETSc >>> >>> comm = PETSc.COMM_WORLD >>> >>> >>> def mat_to_real(arr): >>> return np.block([[arr.real, -arr.imag], [arr.imag, >>> arr.real]]).astype(np.float64) >>> >>> def mat_to_petsc_aij(arr): >>> arr_sc_sp = scipy.sparse.csr_array(arr) >>> mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) >>> rstart, rend = mat.getOwnershipRange() >>> print(rstart, rend) >>> print(arr.shape[0]) >>> print(mat.sizes) >>> I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] >>> J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : >>> arr_sc_sp.indptr[rend]] >>> V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : >>> arr_sc_sp.indptr[rend]] >>> >>> print(I.shape, J.shape, V.shape) >>> mat.setValuesCSR(I, J, V) >>> mat.assemble() >>> return mat >>> >>> >>> l = np.load("../liouvillian.npy") >>> l = mat_to_real(l) >>> pump = np.load("../pump_operator.npy") >>> pump = mat_to_real(pump) >>> state = np.load("../initial_state.npy") >>> state = np.hstack([state.real, state.imag]).astype(np.float64) >>> >>> l = mat_to_petsc_aij(l) >>> pump = mat_to_petsc_aij(pump) >>> >>> >>> jac = l.duplicate() >>> for i in range(8192): >>> jac.setValue(i, i, 0) >>> jac.assemble() >>> jac += l >>> >>> vec = l.createVecRight() >>> vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) >>> vec.assemble() >>> >>> >>> dt = 0.1 >>> >>> ts = PETSc.TS().create(comm=comm) >>> ts.setFromOptions() >>> ts.setProblemType(ts.ProblemType.LINEAR) >>> ts.setEquationType(ts.EquationType.ODE_EXPLICIT) >>> ts.setType(ts.Type.RK) >>> ts.setRKType(ts.RKType.RK3BS) >>> ts.setTime(0) >>> print("KSP:", ts.getKSP().getType()) >>> print("KSP PC:",ts.getKSP().getPC().getType()) >>> print("SNES :", ts.getSNES().getType()) >>> >>> def jacobian(ts, t, u, Amat, Pmat): >>> Amat.zeroEntries() >>> Amat.aypx(1, l, >>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>> Amat.axpy(0.5 * (5 < t < 10), pump, >>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>> >>> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) >>> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # >>> Uncomment for f(t) = 0 >>> ts.setRHSJacobian(jacobian, jac) >>> >>> NUM_STEPS = 200 >>> res = np.empty((NUM_STEPS, 8192), dtype=np.float64) >>> times = [] >>> rstart, rend = vec.getOwnershipRange() >>> for i in tqdm(range(NUM_STEPS)): >>> time = ts.getTime() >>> ts.setMaxTime(time + dt) >>> ts.solve(vec) >>> res[i, rstart:rend] = vec.getArray()[:] >>> times.append(time) >>> >>> I decomposed the complex ODE into a larger real ODE, so that I can >>> easily switch maybe to GPU computation later on. Now, the solutions of >>> both scripts are very much identical, but PETSc runs about 3 times >>> slower at 120it/s on my machine. I don't use MPI for PETSc yet. >>> >>> I strongly suppose that the problem lies within the jacobian definition, >>> as PETSc is about 3 times *faster* than scipy with f(t) = 0 and >>> therefore a constant jacobian. >>> >>> Thank you in advance. >>> >>> All the best, >>> Niclas >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngoetting at itp.uni-bremen.de Thu Aug 10 05:25:24 2023 From: ngoetting at itp.uni-bremen.de (=?UTF-8?Q?Niclas_G=C3=B6tting?=) Date: Thu, 10 Aug 2023 12:25:24 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> <6aa15dfb-8071-4e2e-8ebd-dcfe3a687709@itp.uni-bremen.de> Message-ID: <6f407325-948c-4db3-a354-2c860ce252ec@itp.uni-bremen.de> You are absolutely right for this specific case (I get about 2400it/s instead of 2100it/s). However, the single square function will be replaced by a series of gaussian pulses in the future, which will never be zero. Maybe one could do an approximation and skip the second mult, if the gaussians are close to zero. On 10.08.23 12:16, Stefano Zampini wrote: > If you do the mult of "pump" inside an if it should be faster > > On Thu, Aug 10, 2023, 12:12 Niclas G?tting > wrote: > > If I understood you right, this should be the resulting RHS: > > def rhsfunc5(ts, t, u, F): > ??? l.mult(u, F) > ??? pump.mult(u, tmp_vec) > ??? scale = 0.5 * (5 < t < 10) > ??? F.axpy(scale, tmp_vec) > > It is a little bit slower than option 3, but with about 2100it/s > consistently ~10% faster than option 4. > > Thank you very much for the suggestion! > > On 10.08.23 11:47, Stefano Zampini wrote: >> I would use option 3. Keep a work vector and do a vector >> summation instead of the multiple multiplication by scale and >> 1/scale. >> >> I agree with you the docs are a little misleading here. >> >> On Thu, Aug 10, 2023, 11:40 Niclas G?tting >> wrote: >> >> Thank you both for the very quick answer! >> >> So far, I compiled PETSc with debugging turned on, but I >> think it should still be faster than standard scipy in both >> cases. Actually, Stefano's answer has got me very far >> already; now I only define the RHS of the ODE and no Jacobian >> (I wonder, why the documentation suggests otherwise, though). >> I had the following four tries at implementing the RHS: >> >> 1. def rhsfunc1(ts, t, u, F): >> ??? scale = 0.5 * (5 < t < 10) >> ??? (l + scale * pump).mult(u, F) >> 2. def rhsfunc2(ts, t, u, F): >> ??? l.mult(u, F) >> ??? scale = 0.5 * (5 < t < 10) >> ??? (scale * pump).multAdd(u, F, F) >> 3. def rhsfunc3(ts, t, u, F): >> ??? l.mult(u, F) >> ??? scale = 0.5 * (5 < t < 10) >> ??? if scale != 0: >> ??????? pump.scale(scale) >> ??????? pump.multAdd(u, F, F) >> ??????? pump.scale(1/scale) >> 4. def rhsfunc4(ts, t, u, F): >> ??? tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() >> ??? l.mult(u, F) >> ??? scale = 0.5 * (5 < t < 10) >> ??? tmp_pump.axpy(scale, pump, >> structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) >> ??? tmp_pump.multAdd(u, F, F) >> >> They all yield the same results, but with 50it/s, 800it/, >> 2300it/s and 1900it/s, respectively, which is a huge >> performance boost (almost 7 times as fast as scipy, with >> PETSc debugging still turned on). As the scale function will >> most likely be a gaussian in the future, I think that option >> 3 will be become numerically unstable and I'll have to go >> with option 4, which is already faster than I expected. If >> you think it is possible to speed up the RHS calculation even >> more, I'd be happy to hear your suggestions; the -log_view is >> attached to this message. >> >> One last point: If I didn't misunderstand the documentation >> at https://petsc.org/release/manual/ts/#special-cases, should >> this maybe be changed? >> >> Best regards >> Niclas >> >> On 09.08.23 17:51, Stefano Zampini wrote: >>> TSRK is an explicit solver. Unless you are changing the ts >>> type from command line,? the explicit? jacobian should not >>> be needed. On top of Barry's suggestion, I would suggest you >>> to write the explicit RHS instead of assembly a throw away >>> matrix every time that function needs to be sampled. >>> >>> On Wed, Aug 9, 2023, 17:09 Niclas G?tting >>> wrote: >>> >>> Hi all, >>> >>> I'm currently trying to convert a quantum simulation >>> from scipy to >>> PETSc. The problem itself is extremely simple and of the >>> form \dot{u}(t) >>> = (A_const + f(t)*B_const)*u(t), where f(t) in this >>> simple test case is >>> a square function. The matrices A_const and B_const are >>> extremely sparse >>> and therefore I thought, the problem will be well suited >>> for PETSc. >>> Currently, I solve the ODE with the following procedure >>> in scipy (I can >>> provide the necessary data files, if needed, but they >>> are just some >>> trace-preserving, very sparse matrices): >>> >>> import numpy as np >>> import scipy.sparse >>> import scipy.integrate >>> >>> from tqdm import tqdm >>> >>> >>> l = np.load("../liouvillian.npy") >>> pump = np.load("../pump_operator.npy") >>> state = np.load("../initial_state.npy") >>> >>> l = scipy.sparse.csr_array(l) >>> pump = scipy.sparse.csr_array(pump) >>> >>> def f(t, y, *args): >>> ???? return (l + 0.5 * (5 < t < 10) * pump) @ y >>> ???? #return l @ y # Uncomment for f(t) = 0 >>> >>> dt = 0.1 >>> NUM_STEPS = 200 >>> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) >>> solver = >>> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) >>> times = [] >>> for i in tqdm(range(NUM_STEPS)): >>> ???? res[i, :] = solver.integrate(solver.t + dt) >>> ???? times.append(solver.t) >>> >>> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. >>> tqdm reports >>> about 330it/s on my machine. When converting the code to >>> PETSc, I came >>> to the following result (according to the chapter >>> https://petsc.org/main/manual/ts/#special-cases) >>> >>> import sys >>> import petsc4py >>> petsc4py.init(args=sys.argv) >>> import numpy as np >>> import scipy.sparse >>> >>> from tqdm import tqdm >>> from petsc4py import PETSc >>> >>> comm = PETSc.COMM_WORLD >>> >>> >>> def mat_to_real(arr): >>> ???? return np.block([[arr.real, -arr.imag], [arr.imag, >>> arr.real]]).astype(np.float64) >>> >>> def mat_to_petsc_aij(arr): >>> ???? arr_sc_sp = scipy.sparse.csr_array(arr) >>> ???? mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) >>> ???? rstart, rend = mat.getOwnershipRange() >>> ???? print(rstart, rend) >>> ???? print(arr.shape[0]) >>> ???? print(mat.sizes) >>> ???? I = arr_sc_sp.indptr[rstart : rend + 1] - >>> arr_sc_sp.indptr[rstart] >>> ???? J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : >>> arr_sc_sp.indptr[rend]] >>> ???? V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : >>> arr_sc_sp.indptr[rend]] >>> >>> ???? print(I.shape, J.shape, V.shape) >>> ???? mat.setValuesCSR(I, J, V) >>> ???? mat.assemble() >>> ???? return mat >>> >>> >>> l = np.load("../liouvillian.npy") >>> l = mat_to_real(l) >>> pump = np.load("../pump_operator.npy") >>> pump = mat_to_real(pump) >>> state = np.load("../initial_state.npy") >>> state = np.hstack([state.real, >>> state.imag]).astype(np.float64) >>> >>> l = mat_to_petsc_aij(l) >>> pump = mat_to_petsc_aij(pump) >>> >>> >>> jac = l.duplicate() >>> for i in range(8192): >>> ???? jac.setValue(i, i, 0) >>> jac.assemble() >>> jac += l >>> >>> vec = l.createVecRight() >>> vec.setValues(np.arange(state.shape[0], dtype=np.int32), >>> state) >>> vec.assemble() >>> >>> >>> dt = 0.1 >>> >>> ts = PETSc.TS().create(comm=comm) >>> ts.setFromOptions() >>> ts.setProblemType(ts.ProblemType.LINEAR) >>> ts.setEquationType(ts.EquationType.ODE_EXPLICIT) >>> ts.setType(ts.Type.RK) >>> ts.setRKType(ts.RKType.RK3BS) >>> ts.setTime(0) >>> print("KSP:", ts.getKSP().getType()) >>> print("KSP PC:",ts.getKSP().getPC().getType()) >>> print("SNES :", ts.getSNES().getType()) >>> >>> def jacobian(ts, t, u, Amat, Pmat): >>> ???? Amat.zeroEntries() >>> ???? Amat.aypx(1, l, >>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>> ???? Amat.axpy(0.5 * (5 < t < 10), pump, >>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>> >>> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) >>> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, >>> l, l) # >>> Uncomment for f(t) = 0 >>> ts.setRHSJacobian(jacobian, jac) >>> >>> NUM_STEPS = 200 >>> res = np.empty((NUM_STEPS, 8192), dtype=np.float64) >>> times = [] >>> rstart, rend = vec.getOwnershipRange() >>> for i in tqdm(range(NUM_STEPS)): >>> ???? time = ts.getTime() >>> ???? ts.setMaxTime(time + dt) >>> ???? ts.solve(vec) >>> ???? res[i, rstart:rend] = vec.getArray()[:] >>> ???? times.append(time) >>> >>> I decomposed the complex ODE into a larger real ODE, so >>> that I can >>> easily switch maybe to GPU computation later on. Now, >>> the solutions of >>> both scripts are very much identical, but PETSc runs >>> about 3 times >>> slower at 120it/s on my machine. I don't use MPI for >>> PETSc yet. >>> >>> I strongly suppose that the problem lies within the >>> jacobian definition, >>> as PETSc is about 3 times *faster* than scipy with f(t) >>> = 0 and >>> therefore a constant jacobian. >>> >>> Thank you in advance. >>> >>> All the best, >>> Niclas >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Aug 10 05:27:12 2023 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 10 Aug 2023 12:27:12 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: <6f407325-948c-4db3-a354-2c860ce252ec@itp.uni-bremen.de> References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> <6aa15dfb-8071-4e2e-8ebd-dcfe3a687709@itp.uni-bremen.de> <6f407325-948c-4db3-a354-2c860ce252ec@itp.uni-bremen.de> Message-ID: Then just do the multiplications you need. My proposal was for the example function you were showing. On Thu, Aug 10, 2023, 12:25 Niclas G?tting wrote: > You are absolutely right for this specific case (I get about 2400it/s > instead of 2100it/s). However, the single square function will be replaced > by a series of gaussian pulses in the future, which will never be zero. > Maybe one could do an approximation and skip the second mult, if the > gaussians are close to zero. > On 10.08.23 12:16, Stefano Zampini wrote: > > If you do the mult of "pump" inside an if it should be faster > > On Thu, Aug 10, 2023, 12:12 Niclas G?tting > wrote: > >> If I understood you right, this should be the resulting RHS: >> >> def rhsfunc5(ts, t, u, F): >> l.mult(u, F) >> pump.mult(u, tmp_vec) >> scale = 0.5 * (5 < t < 10) >> F.axpy(scale, tmp_vec) >> >> It is a little bit slower than option 3, but with about 2100it/s >> consistently ~10% faster than option 4. >> >> Thank you very much for the suggestion! >> On 10.08.23 11:47, Stefano Zampini wrote: >> >> I would use option 3. Keep a work vector and do a vector summation >> instead of the multiple multiplication by scale and 1/scale. >> >> I agree with you the docs are a little misleading here. >> >> On Thu, Aug 10, 2023, 11:40 Niclas G?tting >> wrote: >> >>> Thank you both for the very quick answer! >>> >>> So far, I compiled PETSc with debugging turned on, but I think it should >>> still be faster than standard scipy in both cases. Actually, Stefano's >>> answer has got me very far already; now I only define the RHS of the ODE >>> and no Jacobian (I wonder, why the documentation suggests otherwise, >>> though). I had the following four tries at implementing the RHS: >>> >>> 1. def rhsfunc1(ts, t, u, F): >>> scale = 0.5 * (5 < t < 10) >>> (l + scale * pump).mult(u, F) >>> 2. def rhsfunc2(ts, t, u, F): >>> l.mult(u, F) >>> scale = 0.5 * (5 < t < 10) >>> (scale * pump).multAdd(u, F, F) >>> 3. def rhsfunc3(ts, t, u, F): >>> l.mult(u, F) >>> scale = 0.5 * (5 < t < 10) >>> if scale != 0: >>> pump.scale(scale) >>> pump.multAdd(u, F, F) >>> pump.scale(1/scale) >>> 4. def rhsfunc4(ts, t, u, F): >>> tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() >>> l.mult(u, F) >>> scale = 0.5 * (5 < t < 10) >>> tmp_pump.axpy(scale, pump, >>> structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) >>> tmp_pump.multAdd(u, F, F) >>> >>> They all yield the same results, but with 50it/s, 800it/, 2300it/s and >>> 1900it/s, respectively, which is a huge performance boost (almost 7 times >>> as fast as scipy, with PETSc debugging still turned on). As the scale >>> function will most likely be a gaussian in the future, I think that option >>> 3 will be become numerically unstable and I'll have to go with option 4, >>> which is already faster than I expected. If you think it is possible to >>> speed up the RHS calculation even more, I'd be happy to hear your >>> suggestions; the -log_view is attached to this message. >>> >>> One last point: If I didn't misunderstand the documentation at >>> https://petsc.org/release/manual/ts/#special-cases, should this maybe >>> be changed? >>> >>> Best regards >>> Niclas >>> On 09.08.23 17:51, Stefano Zampini wrote: >>> >>> TSRK is an explicit solver. Unless you are changing the ts type from >>> command line, the explicit jacobian should not be needed. On top of >>> Barry's suggestion, I would suggest you to write the explicit RHS instead >>> of assembly a throw away matrix every time that function needs to be >>> sampled. >>> >>> On Wed, Aug 9, 2023, 17:09 Niclas G?tting >>> wrote: >>> >>>> Hi all, >>>> >>>> I'm currently trying to convert a quantum simulation from scipy to >>>> PETSc. The problem itself is extremely simple and of the form >>>> \dot{u}(t) >>>> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is >>>> a square function. The matrices A_const and B_const are extremely >>>> sparse >>>> and therefore I thought, the problem will be well suited for PETSc. >>>> Currently, I solve the ODE with the following procedure in scipy (I can >>>> provide the necessary data files, if needed, but they are just some >>>> trace-preserving, very sparse matrices): >>>> >>>> import numpy as np >>>> import scipy.sparse >>>> import scipy.integrate >>>> >>>> from tqdm import tqdm >>>> >>>> >>>> l = np.load("../liouvillian.npy") >>>> pump = np.load("../pump_operator.npy") >>>> state = np.load("../initial_state.npy") >>>> >>>> l = scipy.sparse.csr_array(l) >>>> pump = scipy.sparse.csr_array(pump) >>>> >>>> def f(t, y, *args): >>>> return (l + 0.5 * (5 < t < 10) * pump) @ y >>>> #return l @ y # Uncomment for f(t) = 0 >>>> >>>> dt = 0.1 >>>> NUM_STEPS = 200 >>>> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) >>>> solver = >>>> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) >>>> times = [] >>>> for i in tqdm(range(NUM_STEPS)): >>>> res[i, :] = solver.integrate(solver.t + dt) >>>> times.append(solver.t) >>>> >>>> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports >>>> about 330it/s on my machine. When converting the code to PETSc, I came >>>> to the following result (according to the chapter >>>> https://petsc.org/main/manual/ts/#special-cases) >>>> >>>> import sys >>>> import petsc4py >>>> petsc4py.init(args=sys.argv) >>>> import numpy as np >>>> import scipy.sparse >>>> >>>> from tqdm import tqdm >>>> from petsc4py import PETSc >>>> >>>> comm = PETSc.COMM_WORLD >>>> >>>> >>>> def mat_to_real(arr): >>>> return np.block([[arr.real, -arr.imag], [arr.imag, >>>> arr.real]]).astype(np.float64) >>>> >>>> def mat_to_petsc_aij(arr): >>>> arr_sc_sp = scipy.sparse.csr_array(arr) >>>> mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) >>>> rstart, rend = mat.getOwnershipRange() >>>> print(rstart, rend) >>>> print(arr.shape[0]) >>>> print(mat.sizes) >>>> I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] >>>> J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : >>>> arr_sc_sp.indptr[rend]] >>>> V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : >>>> arr_sc_sp.indptr[rend]] >>>> >>>> print(I.shape, J.shape, V.shape) >>>> mat.setValuesCSR(I, J, V) >>>> mat.assemble() >>>> return mat >>>> >>>> >>>> l = np.load("../liouvillian.npy") >>>> l = mat_to_real(l) >>>> pump = np.load("../pump_operator.npy") >>>> pump = mat_to_real(pump) >>>> state = np.load("../initial_state.npy") >>>> state = np.hstack([state.real, state.imag]).astype(np.float64) >>>> >>>> l = mat_to_petsc_aij(l) >>>> pump = mat_to_petsc_aij(pump) >>>> >>>> >>>> jac = l.duplicate() >>>> for i in range(8192): >>>> jac.setValue(i, i, 0) >>>> jac.assemble() >>>> jac += l >>>> >>>> vec = l.createVecRight() >>>> vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) >>>> vec.assemble() >>>> >>>> >>>> dt = 0.1 >>>> >>>> ts = PETSc.TS().create(comm=comm) >>>> ts.setFromOptions() >>>> ts.setProblemType(ts.ProblemType.LINEAR) >>>> ts.setEquationType(ts.EquationType.ODE_EXPLICIT) >>>> ts.setType(ts.Type.RK) >>>> ts.setRKType(ts.RKType.RK3BS) >>>> ts.setTime(0) >>>> print("KSP:", ts.getKSP().getType()) >>>> print("KSP PC:",ts.getKSP().getPC().getType()) >>>> print("SNES :", ts.getSNES().getType()) >>>> >>>> def jacobian(ts, t, u, Amat, Pmat): >>>> Amat.zeroEntries() >>>> Amat.aypx(1, l, >>>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>>> Amat.axpy(0.5 * (5 < t < 10), pump, >>>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>>> >>>> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) >>>> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # >>>> Uncomment for f(t) = 0 >>>> ts.setRHSJacobian(jacobian, jac) >>>> >>>> NUM_STEPS = 200 >>>> res = np.empty((NUM_STEPS, 8192), dtype=np.float64) >>>> times = [] >>>> rstart, rend = vec.getOwnershipRange() >>>> for i in tqdm(range(NUM_STEPS)): >>>> time = ts.getTime() >>>> ts.setMaxTime(time + dt) >>>> ts.solve(vec) >>>> res[i, rstart:rend] = vec.getArray()[:] >>>> times.append(time) >>>> >>>> I decomposed the complex ODE into a larger real ODE, so that I can >>>> easily switch maybe to GPU computation later on. Now, the solutions of >>>> both scripts are very much identical, but PETSc runs about 3 times >>>> slower at 120it/s on my machine. I don't use MPI for PETSc yet. >>>> >>>> I strongly suppose that the problem lies within the jacobian >>>> definition, >>>> as PETSc is about 3 times *faster* than scipy with f(t) = 0 and >>>> therefore a constant jacobian. >>>> >>>> Thank you in advance. >>>> >>>> All the best, >>>> Niclas >>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngoetting at itp.uni-bremen.de Thu Aug 10 05:31:31 2023 From: ngoetting at itp.uni-bremen.de (=?UTF-8?Q?Niclas_G=C3=B6tting?=) Date: Thu, 10 Aug 2023 12:31:31 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> <6aa15dfb-8071-4e2e-8ebd-dcfe3a687709@itp.uni-bremen.de> <6f407325-948c-4db3-a354-2c860ce252ec@itp.uni-bremen.de> Message-ID: <6a8b3ffc-d56b-4ea2-a32b-e4fc223a1534@itp.uni-bremen.de> Alright. Again, thank you very much for taking the time to answer my beginner questions! Still a lot to learn.. Have a good day! On 10.08.23 12:27, Stefano Zampini wrote: > Then just do the multiplications you need. My proposal was for the > example function you were showing. > > On Thu, Aug 10, 2023, 12:25 Niclas G?tting > wrote: > > You are absolutely right for this specific case (I get about > 2400it/s instead of 2100it/s). However, the single square function > will be replaced by a series of gaussian pulses in the future, > which will never be zero. Maybe one could do an approximation and > skip the second mult, if the gaussians are close to zero. > > On 10.08.23 12:16, Stefano Zampini wrote: >> If you do the mult of "pump" inside an if it should be faster >> >> On Thu, Aug 10, 2023, 12:12 Niclas G?tting >> wrote: >> >> If I understood you right, this should be the resulting RHS: >> >> def rhsfunc5(ts, t, u, F): >> ??? l.mult(u, F) >> ??? pump.mult(u, tmp_vec) >> ??? scale = 0.5 * (5 < t < 10) >> ??? F.axpy(scale, tmp_vec) >> >> It is a little bit slower than option 3, but with about >> 2100it/s consistently ~10% faster than option 4. >> >> Thank you very much for the suggestion! >> >> On 10.08.23 11:47, Stefano Zampini wrote: >>> I would use option 3. Keep a work vector and do a vector >>> summation instead of the multiple multiplication by scale >>> and 1/scale. >>> >>> I agree with you the docs are a little misleading here. >>> >>> On Thu, Aug 10, 2023, 11:40 Niclas G?tting >>> wrote: >>> >>> Thank you both for the very quick answer! >>> >>> So far, I compiled PETSc with debugging turned on, but I >>> think it should still be faster than standard scipy in >>> both cases. Actually, Stefano's answer has got me very >>> far already; now I only define the RHS of the ODE and no >>> Jacobian (I wonder, why the documentation suggests >>> otherwise, though). I had the following four tries at >>> implementing the RHS: >>> >>> 1. def rhsfunc1(ts, t, u, F): >>> ??? scale = 0.5 * (5 < t < 10) >>> ??? (l + scale * pump).mult(u, F) >>> 2. def rhsfunc2(ts, t, u, F): >>> ??? l.mult(u, F) >>> ??? scale = 0.5 * (5 < t < 10) >>> ??? (scale * pump).multAdd(u, F, F) >>> 3. def rhsfunc3(ts, t, u, F): >>> ??? l.mult(u, F) >>> ??? scale = 0.5 * (5 < t < 10) >>> ??? if scale != 0: >>> ??????? pump.scale(scale) >>> ??????? pump.multAdd(u, F, F) >>> ??????? pump.scale(1/scale) >>> 4. def rhsfunc4(ts, t, u, F): >>> ??? tmp_pump.zeroEntries() # tmp_pump is >>> pump.duplicate() >>> ??? l.mult(u, F) >>> ??? scale = 0.5 * (5 < t < 10) >>> ??? tmp_pump.axpy(scale, pump, >>> structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) >>> ??? tmp_pump.multAdd(u, F, F) >>> >>> They all yield the same results, but with 50it/s, >>> 800it/, 2300it/s and 1900it/s, respectively, which is a >>> huge performance boost (almost 7 times as fast as scipy, >>> with PETSc debugging still turned on). As the scale >>> function will most likely be a gaussian in the future, I >>> think that option 3 will be become numerically unstable >>> and I'll have to go with option 4, which is already >>> faster than I expected. If you think it is possible to >>> speed up the RHS calculation even more, I'd be happy to >>> hear your suggestions; the -log_view is attached to this >>> message. >>> >>> One last point: If I didn't misunderstand the >>> documentation at >>> https://petsc.org/release/manual/ts/#special-cases, >>> should this maybe be changed? >>> >>> Best regards >>> Niclas >>> >>> On 09.08.23 17:51, Stefano Zampini wrote: >>>> TSRK is an explicit solver. Unless you are changing the >>>> ts type from command line,? the explicit? jacobian >>>> should not be needed. On top of Barry's suggestion, I >>>> would suggest you to write the explicit RHS instead of >>>> assembly a throw away matrix every time that function >>>> needs to be sampled. >>>> >>>> On Wed, Aug 9, 2023, 17:09 Niclas G?tting >>>> wrote: >>>> >>>> Hi all, >>>> >>>> I'm currently trying to convert a quantum >>>> simulation from scipy to >>>> PETSc. The problem itself is extremely simple and >>>> of the form \dot{u}(t) >>>> = (A_const + f(t)*B_const)*u(t), where f(t) in this >>>> simple test case is >>>> a square function. The matrices A_const and B_const >>>> are extremely sparse >>>> and therefore I thought, the problem will be well >>>> suited for PETSc. >>>> Currently, I solve the ODE with the following >>>> procedure in scipy (I can >>>> provide the necessary data files, if needed, but >>>> they are just some >>>> trace-preserving, very sparse matrices): >>>> >>>> import numpy as np >>>> import scipy.sparse >>>> import scipy.integrate >>>> >>>> from tqdm import tqdm >>>> >>>> >>>> l = np.load("../liouvillian.npy") >>>> pump = np.load("../pump_operator.npy") >>>> state = np.load("../initial_state.npy") >>>> >>>> l = scipy.sparse.csr_array(l) >>>> pump = scipy.sparse.csr_array(pump) >>>> >>>> def f(t, y, *args): >>>> ???? return (l + 0.5 * (5 < t < 10) * pump) @ y >>>> ???? #return l @ y # Uncomment for f(t) = 0 >>>> >>>> dt = 0.1 >>>> NUM_STEPS = 200 >>>> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) >>>> solver = >>>> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) >>>> times = [] >>>> for i in tqdm(range(NUM_STEPS)): >>>> ???? res[i, :] = solver.integrate(solver.t + dt) >>>> ???? times.append(solver.t) >>>> >>>> Here, A_const = l, B_const = pump and f(t) = 5 < t >>>> < 10. tqdm reports >>>> about 330it/s on my machine. When converting the >>>> code to PETSc, I came >>>> to the following result (according to the chapter >>>> https://petsc.org/main/manual/ts/#special-cases) >>>> >>>> import sys >>>> import petsc4py >>>> petsc4py.init(args=sys.argv) >>>> import numpy as np >>>> import scipy.sparse >>>> >>>> from tqdm import tqdm >>>> from petsc4py import PETSc >>>> >>>> comm = PETSc.COMM_WORLD >>>> >>>> >>>> def mat_to_real(arr): >>>> ???? return np.block([[arr.real, -arr.imag], >>>> [arr.imag, >>>> arr.real]]).astype(np.float64) >>>> >>>> def mat_to_petsc_aij(arr): >>>> ???? arr_sc_sp = scipy.sparse.csr_array(arr) >>>> ???? mat = PETSc.Mat().createAIJ(arr.shape[0], >>>> comm=comm) >>>> ???? rstart, rend = mat.getOwnershipRange() >>>> ???? print(rstart, rend) >>>> ???? print(arr.shape[0]) >>>> ???? print(mat.sizes) >>>> ???? I = arr_sc_sp.indptr[rstart : rend + 1] - >>>> arr_sc_sp.indptr[rstart] >>>> ???? J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : >>>> arr_sc_sp.indptr[rend]] >>>> ???? V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : >>>> arr_sc_sp.indptr[rend]] >>>> >>>> ???? print(I.shape, J.shape, V.shape) >>>> ???? mat.setValuesCSR(I, J, V) >>>> ???? mat.assemble() >>>> ???? return mat >>>> >>>> >>>> l = np.load("../liouvillian.npy") >>>> l = mat_to_real(l) >>>> pump = np.load("../pump_operator.npy") >>>> pump = mat_to_real(pump) >>>> state = np.load("../initial_state.npy") >>>> state = np.hstack([state.real, >>>> state.imag]).astype(np.float64) >>>> >>>> l = mat_to_petsc_aij(l) >>>> pump = mat_to_petsc_aij(pump) >>>> >>>> >>>> jac = l.duplicate() >>>> for i in range(8192): >>>> ???? jac.setValue(i, i, 0) >>>> jac.assemble() >>>> jac += l >>>> >>>> vec = l.createVecRight() >>>> vec.setValues(np.arange(state.shape[0], >>>> dtype=np.int32), state) >>>> vec.assemble() >>>> >>>> >>>> dt = 0.1 >>>> >>>> ts = PETSc.TS().create(comm=comm) >>>> ts.setFromOptions() >>>> ts.setProblemType(ts.ProblemType.LINEAR) >>>> ts.setEquationType(ts.EquationType.ODE_EXPLICIT) >>>> ts.setType(ts.Type.RK) >>>> ts.setRKType(ts.RKType.RK3BS) >>>> ts.setTime(0) >>>> print("KSP:", ts.getKSP().getType()) >>>> print("KSP PC:",ts.getKSP().getPC().getType()) >>>> print("SNES :", ts.getSNES().getType()) >>>> >>>> def jacobian(ts, t, u, Amat, Pmat): >>>> ???? Amat.zeroEntries() >>>> ???? Amat.aypx(1, l, >>>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>>> ???? Amat.axpy(0.5 * (5 < t < 10), pump, >>>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>>> >>>> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) >>>> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, >>>> l, l) # >>>> Uncomment for f(t) = 0 >>>> ts.setRHSJacobian(jacobian, jac) >>>> >>>> NUM_STEPS = 200 >>>> res = np.empty((NUM_STEPS, 8192), dtype=np.float64) >>>> times = [] >>>> rstart, rend = vec.getOwnershipRange() >>>> for i in tqdm(range(NUM_STEPS)): >>>> ???? time = ts.getTime() >>>> ???? ts.setMaxTime(time + dt) >>>> ???? ts.solve(vec) >>>> ???? res[i, rstart:rend] = vec.getArray()[:] >>>> ???? times.append(time) >>>> >>>> I decomposed the complex ODE into a larger real >>>> ODE, so that I can >>>> easily switch maybe to GPU computation later on. >>>> Now, the solutions of >>>> both scripts are very much identical, but PETSc >>>> runs about 3 times >>>> slower at 120it/s on my machine. I don't use MPI >>>> for PETSc yet. >>>> >>>> I strongly suppose that the problem lies within the >>>> jacobian definition, >>>> as PETSc is about 3 times *faster* than scipy with >>>> f(t) = 0 and >>>> therefore a constant jacobian. >>>> >>>> Thank you in advance. >>>> >>>> All the best, >>>> Niclas >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 10 05:54:30 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 10 Aug 2023 06:54:30 -0400 Subject: [petsc-users] error related to 'valgrind' when using MatView In-Reply-To: References: Message-ID: On Thu, Aug 10, 2023 at 2:30?AM maitri ksh wrote: > I am unable to understand what possibly went wrong with my code, I could > load a matrix (large sparse matrix) into petsc, write it out and read it > back into Matlab but when I tried to use MatView to see the matrix-info, it > produces error of some 'corrupt argument, #valgrind'. Can anyone please > help? > You use viewer = PETSC_VIEWER_STDOUT_WORLD but then you Destroy() that viewer. You should not since you did not create it. THanks, Matt > Maitri > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Aug 10 09:08:19 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 10 Aug 2023 07:08:19 -0700 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: Message-ID: Hi Stephan, Yes, MIS(A^T A) -> MIS(MIS(A)) change? Yep, that is it. This change was required because A^T A is super expensive. This change did not do much to my tests but this is complex. I am on travel now, but I can get to this in a few days. You provided me with a lot of data and I can take a look, but I think we need to look at parameters, Thanks, Mark On Wed, Aug 9, 2023 at 10:08?AM Stephan Kramer wrote: > Dear petsc devs > > We have noticed a performance regression using GAMG as the > preconditioner to solve the velocity block in a Stokes equations saddle > point system with variable viscosity solved on a 3D hexahedral mesh of a > spherical shell using Q2-Q1 elements. This is comparing performance from > the beginning of last year (petsc 3.16.4) and a more recent petsc master > (from around May this year). This is the weak scaling analysis we > published in https://doi.org/10.5194/gmd-15-5127-2022 Previously the > number of iterations for the velocity block (inner solve of the Schur > complement) starts at 40 iterations > ( > https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png) > > and only slowly going for larger problems (+more cores). Now the number > of iterations now starts at 60 > ( > https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png), > > same tolerances, again slowly going up with increasing size, with the > cost per iteration also gone up (slightly) - resulting in an increased > runtime of > 50%. > > The main change we can see is that the coarsening seems to have gotten a > lot less aggressive at the first coarsening stage (finest->to > one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change? > The performance issues might be similar to > https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html > ? > > As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the > older petsc version we had: > > rows=126, cols=126, bs=6 > total: nonzeros=15876, allocated nonzeros=15876 > -- > rows=3072, cols=3072, bs=6 > total: nonzeros=3344688, allocated nonzeros=3344688 > -- > rows=91152, cols=91152, bs=6 > total: nonzeros=109729584, allocated nonzeros=109729584 > -- > rows=2655378, cols=2655378, bs=6 > total: nonzeros=1468980252, allocated nonzeros=1468980252 > -- > rows=152175366, cols=152175366, bs=3 > total: nonzeros=29047661586, allocated nonzeros=29047661586 > > Whereas with the newer version we get: > > rows=420, cols=420, bs=6 > total: nonzeros=176400, allocated nonzeros=176400 > -- > rows=6462, cols=6462, bs=6 > total: nonzeros=10891908, allocated nonzeros=10891908 > -- > rows=91716, cols=91716, bs=6 > total: nonzeros=81687384, allocated nonzeros=81687384 > -- > rows=5419362, cols=5419362, bs=6 > total: nonzeros=3668190588, allocated nonzeros=3668190588 > -- > rows=152175366, cols=152175366, bs=3 > total: nonzeros=29047661586, allocated nonzeros=29047661586 > > So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to > 2.6e6 DOFs. Note that we are providing the rigid body near nullspace, > hence the bs=3 to bs=6. > We have tried different values for the gamg_threshold but it doesn't > really seem to significantly alter the coarsening amount in that first > step. > > Do you have any suggestions for further things we should try/look at? > Any feedback would be much appreciated > > Best wishes > Stephan Kramer > > Full logs including log_view timings available from > https://github.com/stephankramer/petsc-scaling/ > > In particular: > > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Thu Aug 10 17:50:39 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Thu, 10 Aug 2023 22:50:39 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Message-ID: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Aug 10 18:14:13 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 10 Aug 2023 16:14:13 -0700 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: Message-ID: BTW, nice bug report ... > > So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to > 2.6e6 DOFs. Yes, this is the critical place to see what is different and going wrong. My 3D tests were not that different and I see you lowered the threshold. Note, you can set the threshold to zero, but your test is running so much differently than mine there is something else going on. Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for in 3D. So it is not clear what the problem is. Some questions: * do you have a picture of this mesh to show me? * what do you mean by Q1-Q2 elements? It would be nice to see if the new and old codes are similar without aggressive coarsening. This was the intended change of the major change in this time frame as you noticed. If these jobs are easy to run, could you check that the old and new versions are similar with "-pc_gamg_square_graph 0 ", ( and you only need one time step). All you need to do is check that the first coarse grid has about the same number of equations (large). BTW, I am starting to think I should add the old method back as an option. I did not think this change would cause large differences. Thanks, Mark > Note that we are providing the rigid body near nullspace, > hence the bs=3 to bs=6. > We have tried different values for the gamg_threshold but it doesn't > really seem to significantly alter the coarsening amount in that first > step. > > Do you have any suggestions for further things we should try/look at? > Any feedback would be much appreciated > > Best wishes > Stephan Kramer > > Full logs including log_view timings available from > https://github.com/stephankramer/petsc-scaling/ > > In particular: > > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at lip6.fr Thu Aug 10 23:48:30 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Fri, 11 Aug 2023 06:48:30 +0200 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: Message-ID: <2487F4FA-42D2-4515-9404-2CA07B2716E0@lip6.fr> > On 11 Aug 2023, at 1:14 AM, Mark Adams wrote: > > BTW, nice bug report ... >> >> So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to >> 2.6e6 DOFs. > > Yes, this is the critical place to see what is different and going wrong. > > My 3D tests were not that different and I see you lowered the threshold. > Note, you can set the threshold to zero, but your test is running so much differently than mine there is something else going on. > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for in 3D. > > So it is not clear what the problem is. Some questions: > > * do you have a picture of this mesh to show me? > * what do you mean by Q1-Q2 elements? > > It would be nice to see if the new and old codes are similar without aggressive coarsening. > This was the intended change of the major change in this time frame as you noticed. > If these jobs are easy to run, could you check that the old and new versions are similar with "-pc_gamg_square_graph 0 ", ( and you only need one time step). > All you need to do is check that the first coarse grid has about the same number of equations (large). > > BTW, I am starting to think I should add the old method back as an option. I did not think this change would cause large differences. Not op, but that would be extremely valuable, IMHO. This is impacting codes left, right, and center (see, e.g., another research group left wondering https://github.com/feelpp/feelpp/issues/2138). Mini-rant: as developers, we are being asked to maintain backward compatibility of the API/headers, but there is no such an enforcement for the numerics. A breakage in the API is ?easy? to fix, you get a compilation error, you either try to fix your code or stick to a lower version of PETSc. Changes in the numerics trigger silent errors which are much more delicate to fix because users do not know whether something needs to be addressed in their code or if there is a change in PETSc. I don?t see the point of enforcing one backward compatibility but not the other. Thanks, Pierre > Thanks, > Mark > > > >> Note that we are providing the rigid body near nullspace, >> hence the bs=3 to bs=6. >> We have tried different values for the gamg_threshold but it doesn't >> really seem to significantly alter the coarsening amount in that first step. >> >> Do you have any suggestions for further things we should try/look at? >> Any feedback would be much appreciated >> >> Best wishes >> Stephan Kramer >> >> Full logs including log_view timings available from >> https://github.com/stephankramer/petsc-scaling/ >> >> In particular: >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 11 09:52:09 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 11 Aug 2023 09:52:09 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > * what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohany at alumni.cmu.edu Fri Aug 11 12:23:15 2023 From: rohany at alumni.cmu.edu (Rohan Yadav) Date: Fri, 11 Aug 2023 10:23:15 -0700 Subject: [petsc-users] 32-bit vs 64-bit GPU support Message-ID: Hi, I was wondering what the official status of 64-bit integer support in the PETSc GPU backend is (specifically CUDA). This question comes from the result of benchmarking some PETSc code and looking at some sources. In particular, I found that PETSc's call to cuSPARSE SpMV seems to always be using the 32-bit integer call, even if I compile PETSc with `--with-64-bit-indices`. After digging around more, I see that PETSc always only creates 32-bit cuSPARSE matrices as well: https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501. I was looking around for a switch somewhere to 64 bit integers inside this code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`. As expected, this all works when the range of coordinates in each sparse matrix partition is less than INT_MAX, but PETSc GPU code breaks in different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic) problem that needs 64 bit integers: ``` #include "petscmat.h" #include "petscvec.h" #include "petsc.h" int main(int argc, char** argv) { PetscInt ierr; PetscInitialize(&argc, &argv, (char *)0, "GPU bug"); PetscInt numRows = 1; PetscInt numCols = PetscInt(INT_MAX) * 2; Mat A; PetscInt rowStart, rowEnd; ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols); MatSetType(A, MATMPIAIJ); MatSetFromOptions(A); MatSetValue(A, 0, 0, 1.0, INSERT_VALUES); MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); Vec b; ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr); VecSetSizes(b, PETSC_DECIDE, numCols); VecSetFromOptions(b); VecSet(b, 0.0); VecSetValue(b, 0, 42.0, INSERT_VALUES); VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES); VecAssemblyBegin(b); VecAssemblyEnd(b); Vec x; ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr); VecSetSizes(x, PETSC_DECIDE, numRows); VecSetFromOptions(x); VecSet(x, 0.0); MatMult(A, b, x); PetscScalar result; VecSum(x, &result); PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result); PetscFinalize(); } ``` When this program is run on CPUs, it outputs 100.0, as expected. When run on a single GPU with `-vec_type cuda -mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with ``` [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted to 32-bit integers [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 09:34:10 2023 [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices [0]PETSC ERROR: #1 checkCupmBlasIntCast() at /local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435 [0]PETSC ERROR: #2 VecAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335 [0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360 [0]PETSC ERROR: #4 DeviceAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389 [0]PETSC ERROR: #5 GetArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545 [0]PETSC ERROR: #6 VectorArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF with errorcode 63. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- ``` and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with ``` ** On entry to cusparseCreateCsr(): dimension mismatch for CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647) [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: GPU error [0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) : invalid value [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 09:43:07 2023 [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices [0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu:2503 [0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu:3544 [0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu:3485 [0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/ mpiaijcusparse.cu:452 [0]PETSC ERROR: #5 MatMult() at /local/home/rohany/petsc/src/mat/interface/matrix.c:2599 ``` Thanks, Rohan Yadav -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Fri Aug 11 12:43:03 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Fri, 11 Aug 2023 17:43:03 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 11 13:06:58 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 11 Aug 2023 14:06:58 -0400 Subject: [petsc-users] 32-bit vs 64-bit GPU support In-Reply-To: References: Message-ID: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> We do not currently have any code for using 64 bit integer sizes on the GPUs. Given the current memory available on GPUs is 64 bit integer support needed? I think even a single vector of length 2^31 will use up most of the GPU's memory? Are the practical, not synthetic, situations that require 64 bit integer support on GPUs immediately? For example, is the vector length of the entire parallel vector across all GPUs limited to 32 bits? We will certainly add such support, but it is a question of priorities; there are many things we need to do to improve PETSc GPU support, and they take time. Unless we have practical use cases, 64 bit integer support for integer sizes on the GPU is not at the top of the list. Of course, we would be very happy with a merge request that would provide this support at any time. Barry > On Aug 11, 2023, at 1:23 PM, Rohan Yadav wrote: > > Hi, > > I was wondering what the official status of 64-bit integer support in the PETSc GPU backend is (specifically CUDA). This question comes from the result of benchmarking some PETSc code and looking at some sources. In particular, I found that PETSc's call to cuSPARSE SpMV seems to always be using the 32-bit integer call, even if I compile PETSc with `--with-64-bit-indices`. After digging around more, I see that PETSc always only creates 32-bit cuSPARSE matrices as well: https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501. I was looking around for a switch somewhere to 64 bit integers inside this code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`. > > As expected, this all works when the range of coordinates in each sparse matrix partition is less than INT_MAX, but PETSc GPU code breaks in different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic) problem that needs 64 bit integers: > > ``` > #include "petscmat.h" > #include "petscvec.h" > #include "petsc.h" > > int main(int argc, char** argv) { > PetscInt ierr; > PetscInitialize(&argc, &argv, (char *)0, "GPU bug"); > > PetscInt numRows = 1; > PetscInt numCols = PetscInt(INT_MAX) * 2; > > Mat A; > PetscInt rowStart, rowEnd; > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols); > MatSetType(A, MATMPIAIJ); > MatSetFromOptions(A); > > MatSetValue(A, 0, 0, 1.0, INSERT_VALUES); > MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > Vec b; > ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr); > VecSetSizes(b, PETSC_DECIDE, numCols); > VecSetFromOptions(b); > VecSet(b, 0.0); > VecSetValue(b, 0, 42.0, INSERT_VALUES); > VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES); > VecAssemblyBegin(b); > VecAssemblyEnd(b); > > Vec x; > ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr); > VecSetSizes(x, PETSC_DECIDE, numRows); > VecSetFromOptions(x); > VecSet(x, 0.0); > > MatMult(A, b, x); > PetscScalar result; > VecSum(x, &result); > PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result); > PetscFinalize(); > } > ``` > > When this program is run on CPUs, it outputs 100.0, as expected. > > When run on a single GPU with `-vec_type cuda -mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with > ``` > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted to 32-bit integers > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown > [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 09:34:10 2023 > [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices > [0]PETSC ERROR: #1 checkCupmBlasIntCast() at /local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435 > [0]PETSC ERROR: #2 VecAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335 > [0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360 > [0]PETSC ERROR: #4 DeviceAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389 > [0]PETSC ERROR: #5 GetArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545 > [0]PETSC ERROR: #6 VectorArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > with errorcode 63. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > ``` > > and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with > ``` > ** On entry to cusparseCreateCsr(): dimension mismatch for CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647) > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) : invalid value > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown > [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 09:43:07 2023 > [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices > [0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2503 > [0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:3544 > [0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:3485 > [0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:452 > [0]PETSC ERROR: #5 MatMult() at /local/home/rohany/petsc/src/mat/interface/matrix.c:2599 > ``` > > Thanks, > > Rohan Yadav > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 11 14:04:58 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 11 Aug 2023 14:04:58 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 11 14:24:43 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 11 Aug 2023 14:24:43 -0500 Subject: [petsc-users] 32-bit vs 64-bit GPU support In-Reply-To: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> References: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> Message-ID: Rohan, You could try the petsc/kokkos backend. I have not tested it, but I guess it should handle 64 bit CUDA index types. I guess the petsc/cuda 32-bit limit came from old CUDA versions where only 32-bit indices were supported such that the original developers hardwired the type to THRUSTINTARRAY32. We try to support generations of cuda toolkits and thus have the current code. Anyway, this should be fixed. --Junchao Zhang On Fri, Aug 11, 2023 at 1:07?PM Barry Smith wrote: > > We do not currently have any code for using 64 bit integer sizes on the > GPUs. > > Given the current memory available on GPUs is 64 bit integer support > needed? I think even a single vector of length 2^31 will use up most of the > GPU's memory? Are the practical, not synthetic, situations that require 64 > bit integer support on GPUs immediately? For example, is the vector length > of the entire parallel vector across all GPUs limited to 32 bits? > > We will certainly add such support, but it is a question of priorities; > there are many things we need to do to improve PETSc GPU support, and they > take time. Unless we have practical use cases, 64 bit integer support for > integer sizes on the GPU is not at the top of the list. Of course, we would > be very happy with a merge request that would provide this support at any > time. > > Barry > > > > On Aug 11, 2023, at 1:23 PM, Rohan Yadav wrote: > > Hi, > > I was wondering what the official status of 64-bit integer support in the > PETSc GPU backend is (specifically CUDA). This question comes from the > result of benchmarking some PETSc code and looking at some sources. In > particular, I found that PETSc's call to cuSPARSE SpMV seems to always be > using the 32-bit integer call, even if I compile PETSc with > `--with-64-bit-indices`. After digging around more, I see that PETSc always > only creates 32-bit cuSPARSE matrices as well: > https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501. > I was looking around for a switch somewhere to 64 bit integers inside this > code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`. > > As expected, this all works when the range of coordinates in each sparse > matrix partition is less than INT_MAX, but PETSc GPU code breaks in > different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic) > problem that needs 64 bit integers: > > ``` > #include "petscmat.h" > #include "petscvec.h" > #include "petsc.h" > > int main(int argc, char** argv) { > PetscInt ierr; > PetscInitialize(&argc, &argv, (char *)0, "GPU bug"); > > PetscInt numRows = 1; > PetscInt numCols = PetscInt(INT_MAX) * 2; > > Mat A; > PetscInt rowStart, rowEnd; > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols); > MatSetType(A, MATMPIAIJ); > MatSetFromOptions(A); > > MatSetValue(A, 0, 0, 1.0, INSERT_VALUES); > MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > Vec b; > ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr); > VecSetSizes(b, PETSC_DECIDE, numCols); > VecSetFromOptions(b); > VecSet(b, 0.0); > VecSetValue(b, 0, 42.0, INSERT_VALUES); > VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES); > VecAssemblyBegin(b); > VecAssemblyEnd(b); > > Vec x; > ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr); > VecSetSizes(x, PETSC_DECIDE, numRows); > VecSetFromOptions(x); > VecSet(x, 0.0); > > MatMult(A, b, x); > PetscScalar result; > VecSum(x, &result); > PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result); > PetscFinalize(); > } > ``` > > When this program is run on CPUs, it outputs 100.0, as expected. > > When run on a single GPU with `-vec_type cuda -mat_type aijcusparse > -use_gpu_aware_mpi 0` it fails with > ``` > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted > to 32-bit integers > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown > [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 > 09:34:10 2023 > [0]PETSC ERROR: Configure options --with-cuda=1 > --prefix=/local/home/rohany/petsc/petsc-install/ > --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 > CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 > --with-64-bit-indices > [0]PETSC ERROR: #1 checkCupmBlasIntCast() at > /local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435 > [0]PETSC ERROR: #2 VecAllocateCheck_() at > /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335 > [0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at > /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360 > [0]PETSC ERROR: #4 DeviceAllocateCheck_() at > /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389 > [0]PETSC ERROR: #5 GetArray() at > /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545 > [0]PETSC ERROR: #6 VectorArray() at > /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273 > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > with errorcode 63. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > ``` > > and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it > fails with > ``` > ** On entry to cusparseCreateCsr(): dimension mismatch for > CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647) > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) : > invalid value > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown > [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 > 09:43:07 2023 > [0]PETSC ERROR: Configure options --with-cuda=1 > --prefix=/local/home/rohany/petsc/petsc-install/ > --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 > CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 > --with-64-bit-indices > [0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at > /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2503 > [0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at > /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:3544 > [0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at > /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:3485 > [0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at > /local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:452 > > [0]PETSC ERROR: #5 MatMult() at > /local/home/rohany/petsc/src/mat/interface/matrix.c:2599 > ``` > > Thanks, > > Rohan Yadav > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohany at alumni.cmu.edu Fri Aug 11 14:31:15 2023 From: rohany at alumni.cmu.edu (Rohan Yadav) Date: Fri, 11 Aug 2023 12:31:15 -0700 Subject: [petsc-users] 32-bit vs 64-bit GPU support In-Reply-To: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> References: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> Message-ID: > We do not currently have any code for using 64 bit integer sizes on the GPUs. Thank you, just wanted confirmation. > Given the current memory available on GPUs is 64 bit integer support needed? I think even a single vector of length 2^31 will use up most of the GPU's memory? Are the practical, not synthetic, situations that require 64 bit integer support on GPUs immediately? For example, is the vector length of the entire parallel vector across all GPUs limited to 32 bits? With modern GPU sizes, for example A100's with 80GB of memory, a vector of length 2^31 is not that much memory -- one could conceivably run a CG solve with local vectors > 2^31. Thanks Junchao, I might look into that. However, I currently am not trying to solve such a large problem -- these questions just came from wondering why the cuSPARSE kernel PETSc was calling was running faster than mine. Rohan -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 11 14:35:48 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 11 Aug 2023 14:35:48 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Aug 11 14:38:45 2023 From: jed at jedbrown.org (Jed Brown) Date: Fri, 11 Aug 2023 13:38:45 -0600 Subject: [petsc-users] 32-bit vs 64-bit GPU support In-Reply-To: References: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> Message-ID: <878rahs722.fsf@jedbrown.org> Rohan Yadav writes: > With modern GPU sizes, for example A100's with 80GB of memory, a vector of > length 2^31 is not that much memory -- one could conceivably run a CG solve > with local vectors > 2^31. Yeah, each vector would be 8 GB (single precision) or 16 GB (double). You can't store a matrix of this size, and probably not a "mesh", but it's possible to create such a problem if everything is matrix-free (possibly with matrix-free geometric multigrid). This is more likely to show up in a benchmark than any real science or engineering probelm. We should support it, but it still seems hypothetical and not urgent. > Thanks Junchao, I might look into that. However, I currently am not trying > to solve such a large problem -- these questions just came from wondering > why the cuSPARSE kernel PETSc was calling was running faster than mine. Hah, bandwidth doesn't like. ;-) From jacob.fai at gmail.com Fri Aug 11 14:59:51 2023 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Fri, 11 Aug 2023 15:59:51 -0400 Subject: [petsc-users] 32-bit vs 64-bit GPU support In-Reply-To: <878rahs722.fsf@jedbrown.org> References: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> <878rahs722.fsf@jedbrown.org> Message-ID: <87489686-6B1F-4C58-BB02-ED4D1C7BED76@gmail.com> > We should support it, but it still seems hypothetical and not urgent. FWIW, cuBLAS only just added 64-bit int support with CUDA 12 (naturally, with a completely separate API). More generally, it would be interesting to know the breakdown of installed CUDA versions for users. Unlike compilers etc, I suspect that cluster admins (and those running on local machines) are much more likely to be updating their CUDA toolkits to the latest versions as they often contain critical performance improvements. It would help us decide on minimum version to support. We don?t have any real idea of the current minimum version, last time it was estimated to be CUDA 7 IIRC? Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 11, 2023, at 15:38, Jed Brown wrote: > > Rohan Yadav writes: > >> With modern GPU sizes, for example A100's with 80GB of memory, a vector of >> length 2^31 is not that much memory -- one could conceivably run a CG solve >> with local vectors > 2^31. > > Yeah, each vector would be 8 GB (single precision) or 16 GB (double). You can't store a matrix of this size, and probably not a "mesh", but it's possible to create such a problem if everything is matrix-free (possibly with matrix-free geometric multigrid). This is more likely to show up in a benchmark than any real science or engineering probelm. We should support it, but it still seems hypothetical and not urgent. > >> Thanks Junchao, I might look into that. However, I currently am not trying >> to solve such a large problem -- these questions just came from wondering >> why the cuSPARSE kernel PETSc was calling was running faster than mine. > > Hah, bandwidth doesn't like. ;-) From jed at jedbrown.org Fri Aug 11 15:13:04 2023 From: jed at jedbrown.org (Jed Brown) Date: Fri, 11 Aug 2023 14:13:04 -0600 Subject: [petsc-users] 32-bit vs 64-bit GPU support In-Reply-To: <87489686-6B1F-4C58-BB02-ED4D1C7BED76@gmail.com> References: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> <878rahs722.fsf@jedbrown.org> <87489686-6B1F-4C58-BB02-ED4D1C7BED76@gmail.com> Message-ID: <875y5ls5gv.fsf@jedbrown.org> Jacob Faibussowitsch writes: > More generally, it would be interesting to know the breakdown of installed CUDA versions for users. Unlike compilers etc, I suspect that cluster admins (and those running on local machines) are much more likely to be updating their CUDA toolkits to the latest versions as they often contain critical performance improvements. One difference is that some sites (not looking at you at all, ALCF) still run pretty ancient drivers and/or have broken GPU-aware MPI with all but a specific ancient version of CUDA (OLCF, LLNL). With a normal compiler, you can choose to use the latest version, but with CUDA, people are firmly stuck on old versions. From balay at mcs.anl.gov Fri Aug 11 15:26:32 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 11 Aug 2023 15:26:32 -0500 (CDT) Subject: [petsc-users] 32-bit vs 64-bit GPU support In-Reply-To: <875y5ls5gv.fsf@jedbrown.org> References: <8278F56C-BAC6-427B-8F6B-C85817A9D5A8@petsc.dev> <878rahs722.fsf@jedbrown.org> <87489686-6B1F-4C58-BB02-ED4D1C7BED76@gmail.com> <875y5ls5gv.fsf@jedbrown.org> Message-ID: On Fri, 11 Aug 2023, Jed Brown wrote: > Jacob Faibussowitsch writes: > > > More generally, it would be interesting to know the breakdown of installed CUDA versions for users. Unlike compilers etc, I suspect that cluster admins (and those running on local machines) are much more likely to be updating their CUDA toolkits to the latest versions as they often contain critical performance improvements. > > One difference is that some sites (not looking at you at all, ALCF) still run pretty ancient drivers and/or have broken GPU-aware MPI with all but a specific ancient version of CUDA (OLCF, LLNL). With a normal compiler, you can choose to use the latest version, but with CUDA, people are firmly stuck on old versions. > Well Nvidia keeps phasing out support for older GPUs in newer CUDA releases - so unless GPUs are upgraded - they can't really upgrade (to latest) CUDA versions .. [this is in addition to the usual reasons admins don't do software upgrades... Ignore clusters - our CUDA CI machine has random stability issues - so we had to downgrade/freeze cuda/driver versions to keep the machine functional] Satish From bsmith at petsc.dev Fri Aug 11 15:58:13 2023 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 11 Aug 2023 16:58:13 -0400 Subject: [petsc-users] error related to 'valgrind' when using MatView In-Reply-To: References: Message-ID: <870B282A-B78E-4C83-A3F4-A95E46A9BA95@petsc.dev> New error checking to prevent this confusion in the future: https://gitlab.com/petsc/petsc/-/merge_requests/6804 > On Aug 10, 2023, at 6:54 AM, Matthew Knepley wrote: > > On Thu, Aug 10, 2023 at 2:30?AM maitri ksh > wrote: >> I am unable to understand what possibly went wrong with my code, I could load a matrix (large sparse matrix) into petsc, write it out and read it back into Matlab but when I tried to use MatView to see the matrix-info, it produces error of some 'corrupt argument, #valgrind'. Can anyone please help? > > You use > > viewer = PETSC_VIEWER_STDOUT_WORLD > > but then you Destroy() that viewer. You should not since you did not create it. > > THanks, > > Matt > >> Maitri > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Fri Aug 11 16:36:35 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Fri, 11 Aug 2023 21:36:35 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3. I tried the case again, 2 CPUs and one GPU and get this error now: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 srun: error: enki12: tasks 0-1: Aborted (core dumped) This was the slurm submission script in this case: #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=debug #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 # PETSc dir and arch: export PETSC_DIR=/home/mnv/Software/petsc export PETSC_ARCH=arch-linux-c-dbg # SYSTEM name: export MYSYSTEM=enki # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg The configure.log for the PETSc build is attached. Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : 0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: GPU error [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: GPU error [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda ... I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver. Let me know if you have any ideas as to what is happening. Thanks, Marcos ________________________________ From: Junchao Zhang Sent: Friday, August 11, 2023 3:35 PM To: Vanella, Marcos (Fed) ; PETSc users list ; Satish Balay Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it. I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error: .... CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) | 1 Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit interface /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: 38 | import tIS | 1 Error: IMPORT statement at (1) only permitted in an INTERFACE body /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: 39 | PetscInt a ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: 40 | PetscInt b ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: 41 | PetscInt c ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: 42 | PetscInt d ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: 43 | PetscInt e ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: 44 | PetscInt f ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: 45 | PetscInt g ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: 46 | IS h ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: 47 | IS i ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: 48 | PetscErrorCode z | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: 49 | end subroutine PCASMCreateSubdomains2D | 1 Error: Expecting END INTERFACE statement at (1) make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o make[3]: Leaving directory '/home/mnv/Software/petsc' make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 make[2]: Leaving directory '/home/mnv/Software/petsc' **************************ERROR************************************* Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** make[1]: *** [makefile:45: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:04 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 2531264 bytes Desc: configure.log URL: From junchao.zhang at gmail.com Fri Aug 11 16:52:53 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 11 Aug 2023 16:52:53 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok. The comment at the end shows how to run it test: requires: cuda suffix: 1_cuda nsize: 4 args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse --Junchao Zhang On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cho at slac.stanford.edu Fri Aug 11 23:53:07 2023 From: cho at slac.stanford.edu (Ng, Cho-Kuen) Date: Sat, 12 Aug 2023 04:53:07 +0000 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> Message-ID: Barry, I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: KSP Object: 32 MPI processes Can petsc show the number of GPUs used? Thanks, Cho ________________________________ From: Barry Smith Sent: Wednesday, August 9, 2023 4:09 PM To: Ng, Cho-Kuen Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Using PETSc GPU backend We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen wrote: Barry and Matt, Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? Best, Cho ________________________________ From: Barry Smith > Sent: Monday, July 17, 2023 6:58 AM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: Barry, Thank you so much for the clarification. I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? Cho ________________________________ From: Barry Smith > Sent: Saturday, July 15, 2023 8:36 AM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Cho, We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda Sorry for the confusion. Barry On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: Matt, After inserting 2 lines in the code: ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); ierr = MatSetFromOptions(A);CHKERRQ(ierr); ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); "There are no unused options." However, there is no improvement on the GPU performance. 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). THanks, Matt Thanks, Cho ________________________________ From: Matthew Knepley > Sent: Friday, July 14, 2023 5:57 PM To: Ng, Cho-Kuen > Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: I managed to pass the following options to PETSc using a GPU node on Perlmutter. -mat_type aijcusparse -vec_type cuda -log_view -options_left Below is a summary of the test using 4 MPI tasks and 1 GPU per task. o #PETSc Option Table entries: ???-log_view ???-mat_type aijcusparse -options_left -vec_type cuda #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-mat_type value: aijcusparse The -mat_type option has not been used. In the application code, we use ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. Thanks, Matt o The percent flops on the GPU for KSPSolve is 17%. In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? Thanks, Cho ________________________________ From: Ng, Cho-Kuen > Sent: Friday, June 30, 2023 7:57 AM To: Barry Smith >; Mark Adams > Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Barry, Mark and Matt, Thank you all for the suggestions. I will modify the code so we can pass runtime options. Cho ________________________________ From: Barry Smith > Sent: Friday, June 30, 2023 7:01 AM To: Mark Adams > Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left Mark On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: Mark, The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. Thanks, Matt Cho ________________________________ From: Ng, Cho-Kuen > Sent: Thursday, June 29, 2023 8:32 PM To: Mark Adams > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Mark, Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? Cho ________________________________ From: Mark Adams > Sent: Thursday, June 29, 2023 5:55 PM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. Mark On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? Thanks, Cho -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ ________________________________ From: Barry Smith > Sent: Monday, July 17, 2023 6:58 AM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen > wrote: Barry, Thank you so much for the clarification. I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? Cho ________________________________ From: Barry Smith > Sent: Saturday, July 15, 2023 8:36 AM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Cho, We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda Sorry for the confusion. Barry On Jul 15, 2023, at 8:03 AM, Matthew Knepley > wrote: On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen > wrote: Matt, After inserting 2 lines in the code: ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); ierr = MatSetFromOptions(A);CHKERRQ(ierr); ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); "There are no unused options." However, there is no improvement on the GPU performance. 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). THanks, Matt Thanks, Cho ________________________________ From: Matthew Knepley > Sent: Friday, July 14, 2023 5:57 PM To: Ng, Cho-Kuen > Cc: Barry Smith >; Mark Adams >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen > wrote: I managed to pass the following options to PETSc using a GPU node on Perlmutter. -mat_type aijcusparse -vec_type cuda -log_view -options_left Below is a summary of the test using 4 MPI tasks and 1 GPU per task. o #PETSc Option Table entries: ???-log_view ???-mat_type aijcusparse -options_left -vec_type cuda #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There is one unused database option. It is: Option left: name:-mat_type value: aijcusparse The -mat_type option has not been used. In the application code, we use ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. Thanks, Matt o The percent flops on the GPU for KSPSolve is 17%. In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? Thanks, Cho ________________________________ From: Ng, Cho-Kuen > Sent: Friday, June 30, 2023 7:57 AM To: Barry Smith >; Mark Adams > Cc: Matthew Knepley >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Barry, Mark and Matt, Thank you all for the suggestions. I will modify the code so we can pass runtime options. Cho ________________________________ From: Barry Smith > Sent: Friday, June 30, 2023 7:01 AM To: Mark Adams > Cc: Matthew Knepley >; Ng, Cho-Kuen >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. On Jun 30, 2023, at 9:30 AM, Mark Adams > wrote: PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left Mark On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley > wrote: On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users > wrote: Mark, The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. Thanks, Matt Cho ________________________________ From: Ng, Cho-Kuen > Sent: Thursday, June 29, 2023 8:32 PM To: Mark Adams > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Mark, Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? Cho ________________________________ From: Mark Adams > Sent: Thursday, June 29, 2023 5:55 PM To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. Mark On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users > wrote: I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? Thanks, Cho -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Sat Aug 12 07:02:55 2023 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Sat, 12 Aug 2023 08:02:55 -0400 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> Message-ID: > Can petsc show the number of GPUs used? -device_view Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users wrote: > > Barry, > > I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: > > KSP Object: 32 MPI processes > > Can petsc show the number of GPUs used? > > Thanks, > Cho > > From: Barry Smith > Sent: Wednesday, August 9, 2023 4:09 PM > To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend > > We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen wrote: >> >> Barry and Matt, >> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? >> >> Best, >> Cho >> >> From: Barry Smith >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen >>>> Cc: Barry Smith ; Mark Adams ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith ; Mark Adams >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> From: Barry Smith >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen >>>> Cc: Barry Smith ; Mark Adams ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith ; Mark Adams >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ From cho at slac.stanford.edu Sat Aug 12 10:08:13 2023 From: cho at slac.stanford.edu (Ng, Cho-Kuen) Date: Sat, 12 Aug 2023 15:08:13 +0000 Subject: [petsc-users] Using PETSc GPU backend In-Reply-To: References: <10FFD366-3B5A-4B3D-A5AF-8BA0C093C882@petsc.dev> <818C5B84-36CE-4971-8ED4-A4DAD0326D73@petsc.dev> Message-ID: Thanks Jacob. ________________________________ From: Jacob Faibussowitsch Sent: Saturday, August 12, 2023 5:02 AM To: Ng, Cho-Kuen Cc: Barry Smith ; petsc-users Subject: Re: [petsc-users] Using PETSc GPU backend > Can petsc show the number of GPUs used? -device_view Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users wrote: > > Barry, > > I tried again today on Perlmutter and running on multiple GPU nodes worked. Likely, I had messed up something the other day. Also, I was able to have multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the number of MPI tasks: > > KSP Object: 32 MPI processes > > Can petsc show the number of GPUs used? > > Thanks, > Cho > > From: Barry Smith > Sent: Wednesday, August 9, 2023 4:09 PM > To: Ng, Cho-Kuen > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc GPU backend > > We would need more information about "hanging". Do PETSc examples and tiny problems "hang" on multiple nodes? If you run with -info what are the last messages printed? Can you run with a debugger to see where it is "hanging"? > > > >> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen wrote: >> >> Barry and Matt, >> >> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this? >> >> Best, >> Cho >> >> From: Barry Smith >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen >>>> Cc: Barry Smith ; Mark Adams ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith ; Mark Adams >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> From: Barry Smith >> Sent: Monday, July 17, 2023 6:58 AM >> To: Ng, Cho-Kuen >> Cc: petsc-users at mcs.anl.gov >> Subject: Re: [petsc-users] Using PETSc GPU backend >> >> The examples that use DM, in particular DMDA all trivially support using the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda >> >> >> >>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen wrote: >>> >>> Barry, >>> >>> Thank you so much for the clarification. >>> >>> I see that ex104.c and ex300.c use MatXAIJSetPreallocation(). Are there other tutorials available? >>> >>> Cho >>> From: Barry Smith >>> Sent: Saturday, July 15, 2023 8:36 AM >>> To: Ng, Cho-Kuen >>> Cc: petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] Using PETSc GPU backend >>> >>> Cho, >>> >>> We currently have a crappy API for turning on GPU support, and our documentation is misleading in places. >>> >>> People constantly say "to use GPU's with PETSc you only need to use -mat_type aijcusparse (for example)" This is incorrect. >>> >>> This does not work with code that uses the convenience Mat constructors such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), MatXXXSetPreallocation(). ... Similarly you need to use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda >>> >>> If you use DM to create the matrices and vectors then you can use -dm_mat_type aijcusparse -dm_vec_type cuda >>> >>> Sorry for the confusion. >>> >>> Barry >>> >>> >>> >>> >>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley wrote: >>>> >>>> On Sat, Jul 15, 2023 at 1:44?AM Ng, Cho-Kuen wrote: >>>> Matt, >>>> >>>> After inserting 2 lines in the code: >>>> >>>> ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr); >>>> ierr = MatSetFromOptions(A);CHKERRQ(ierr); >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> "There are no unused options." However, there is no improvement on the GPU performance. >>>> >>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you created in steps 1 and 2. This is detailed in the manual. >>>> >>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before MatSetFromOptions(). >>>> >>>> THanks, >>>> >>>> Matt >>>> Thanks, >>>> Cho >>>> >>>> From: Matthew Knepley >>>> Sent: Friday, July 14, 2023 5:57 PM >>>> To: Ng, Cho-Kuen >>>> Cc: Barry Smith ; Mark Adams ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> On Fri, Jul 14, 2023 at 7:57?PM Ng, Cho-Kuen wrote: >>>> I managed to pass the following options to PETSc using a GPU node on Perlmutter. >>>> >>>> -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>> >>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task. >>>> >>>> o #PETSc Option Table entries: >>>> ???-log_view >>>> ???-mat_type aijcusparse >>>> -options_left >>>> -vec_type cuda >>>> #End of PETSc Option Table entries >>>> WARNING! There are options you set that were not used! >>>> WARNING! could be spelling mistake, etc! >>>> There is one unused database option. It is: >>>> Option left: name:-mat_type value: aijcusparse >>>> >>>> The -mat_type option has not been used. In the application code, we use >>>> >>>> ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n, >>>> d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr); >>>> >>>> >>>> If you create the Mat this way, then you need MatSetFromOptions() in order to set the type from the command line. >>>> >>>> Thanks, >>>> >>>> Matt >>>> o The percent flops on the GPU for KSPSolve is 17%. >>>> >>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of magnitude slower. How can I improve the GPU performance? >>>> >>>> Thanks, >>>> Cho >>>> From: Ng, Cho-Kuen >>>> Sent: Friday, June 30, 2023 7:57 AM >>>> To: Barry Smith ; Mark Adams >>>> Cc: Matthew Knepley ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> Barry, Mark and Matt, >>>> >>>> Thank you all for the suggestions. I will modify the code so we can pass runtime options. >>>> >>>> Cho >>>> From: Barry Smith >>>> Sent: Friday, June 30, 2023 7:01 AM >>>> To: Mark Adams >>>> Cc: Matthew Knepley ; Ng, Cho-Kuen ; petsc-users at mcs.anl.gov >>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>> >>>> Note that options like -mat_type aijcusparse -vec_type cuda only work if the program is set up to allow runtime swapping of matrix and vector types. If you have a call to MatCreateMPIAIJ() or other specific types then then these options do nothing but because Mark had you use -options_left the program will tell you at the end that it did not use the option so you will know. >>>> >>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams wrote: >>>>> >>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and you run: >>>>> >>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> >>>>> Mark >>>>> >>>>> On Fri, Jun 30, 2023 at 6:16?AM Matthew Knepley wrote: >>>>> On Fri, Jun 30, 2023 at 1:13?AM Ng, Cho-Kuen via petsc-users wrote: >>>>> Mark, >>>>> >>>>> The application code reads in parameters from an input file, where we can put the PETSc runtime options. Then we pass the options to PetscInitialize(...). Does that sounds right? >>>>> >>>>> PETSc will read command line argument automatically in PetscInitialize() unless you shut it off. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> Cho >>>>> From: Ng, Cho-Kuen >>>>> Sent: Thursday, June 29, 2023 8:32 PM >>>>> To: Mark Adams >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Mark, >>>>> >>>>> Thanks for the information. How do I put the runtime options for the executable, say, a.out, which does not have the provision to append arguments? Do I need to change the C++ main to read in the options? >>>>> >>>>> Cho >>>>> From: Mark Adams >>>>> Sent: Thursday, June 29, 2023 5:55 PM >>>>> To: Ng, Cho-Kuen >>>>> Cc: petsc-users at mcs.anl.gov >>>>> Subject: Re: [petsc-users] Using PETSc GPU backend >>>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view -options_left >>>>> The last column of the performance data (from -log_view) will be the percent flops on the GPU. Check that that is > 0. >>>>> >>>>> The end of the output will list the options that were used and options that were _not_ used (if any). Check that there are no options left. >>>>> >>>>> Mark >>>>> >>>>> On Thu, Jun 29, 2023 at 7:50?PM Ng, Cho-Kuen via petsc-users wrote: >>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend? >>>>> >>>>> Thanks, >>>>> Cho >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Aug 12 11:02:50 2023 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 12 Aug 2023 12:02:50 -0400 Subject: [petsc-users] performance regression with GAMG In-Reply-To: <2487F4FA-42D2-4515-9404-2CA07B2716E0@lip6.fr> References: <2487F4FA-42D2-4515-9404-2CA07B2716E0@lip6.fr> Message-ID: I will add the old method back in. And I did touch the MIS code and graph creation, which turns a bs>1 matrix into a bs==1 matrix/graph. I don't think I changed the semantics but I did not verify that carefully. In thinking more about this error I realize that this test has thin body elements, probably, and the threshold can make a big difference here. The threshold is used in a slightly different way in the new algorithm, and there is no avoiding that. Stephan did play around with the threshold but it would be nice to see if the old and new code are still (very) different with zero or < 0 threshold (< 0 will keep zero entries like BCs). I am still puzzled why the new code was so different, given that there does not seem to be some sensitivity to threshold. I'd like to figure this out because the old method can be very slow and use a lot of memory for full 3D problems. Thanks, Mark On Fri, Aug 11, 2023 at 12:48?AM Pierre Jolivet wrote: > > > On 11 Aug 2023, at 1:14 AM, Mark Adams wrote: > > BTW, nice bug report ... > >> >> So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to >> 2.6e6 DOFs. > > > Yes, this is the critical place to see what is different and going wrong. > > My 3D tests were not that different and I see you lowered the threshold. > Note, you can set the threshold to zero, but your test is running so much > differently than mine there is something else going on. > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for > in 3D. > > So it is not clear what the problem is. Some questions: > > * do you have a picture of this mesh to show me? > * what do you mean by Q1-Q2 elements? > > It would be nice to see if the new and old codes are similar without > aggressive coarsening. > This was the intended change of the major change in this time frame as you > noticed. > If these jobs are easy to run, could you check that the old and new > versions are similar with "-pc_gamg_square_graph 0 ", ( and you only need > one time step). > All you need to do is check that the first coarse grid has about the same > number of equations (large). > > BTW, I am starting to think I should add the old method back as an option. > I did not think this change would cause large differences. > > > Not op, but that would be extremely valuable, IMHO. > This is impacting codes left, right, and center (see, e.g., another > research group left wondering https://github.com/feelpp/feelpp/issues/2138 > ). > > Mini-rant: as developers, we are being asked to maintain backward > compatibility of the API/headers, but there is no such an enforcement for > the numerics. > A breakage in the API is ?easy? to fix, you get a compilation error, you > either try to fix your code or stick to a lower version of PETSc. > Changes in the numerics trigger silent errors which are much more delicate > to fix because users do not know whether something needs to be addressed in > their code or if there is a change in PETSc. > I don?t see the point of enforcing one backward compatibility but not the > other. > > Thanks, > Pierre > > Thanks, > Mark > > > > >> Note that we are providing the rigid body near nullspace, >> hence the bs=3 to bs=6. >> We have tried different values for the gamg_threshold but it doesn't >> really seem to significantly alter the coarsening amount in that first >> step. >> >> Do you have any suggestions for further things we should try/look at? >> Any feedback would be much appreciated >> >> Best wishes >> Stephan Kramer >> >> Full logs including log_view timings available from >> https://github.com/stephankramer/petsc-scaling/ >> >> In particular: >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Mon Aug 14 03:38:52 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Mon, 14 Aug 2023 11:38:52 +0300 Subject: [petsc-users] eigenvalue problem involving inverse of a matrix Message-ID: Hi, I need to solve an eigenvalue problem *Ax=lmbda*x*, where A=(B^-H)*Q*B^-1 is a hermitian matrix, 'B^-H' refers to the hermitian of the inverse of the matrix B. Theoretically it would take around 1.8TB to explicitly compute the matrix B^-1 . A feasible way to solve this eigenvalue problem would be to use the LU factors of the B matrix instead. So the problem looks something like this: (*((LU)^-H)*Q***(LU)^-1)***x = lmbda*x* For a guess value of the (normalised) eigen-vector 'x', 1) one would require to solve two linear equations to get '*Ax*', (LU)*y=x, solve for 'y', ((LU)^H)*z=Q*y, solve for 'z' then one can follow the conventional power-iteration procedure 2) update eigenvector: x= z/||z|| 3) get eigenvalue using the Rayleigh quotient 4) go to step-1 and loop through with a conditional break. Is there any example in petsc that does not require explicit declaration of the matrix '*A*' (*Ax=lmbda*x)* and instead takes a vector '*Ax*' as input for an iterative algorithm (like the one above). I looked into some of the examples of eigenvalue problems ( it's highly possible that I might have overlooked, I am new to petsc) but I couldn't find a way to circumvent the explicit declaration of matrix A. Maitri -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at lip6.fr Mon Aug 14 03:45:27 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Mon, 14 Aug 2023 10:45:27 +0200 Subject: [petsc-users] eigenvalue problem involving inverse of a matrix In-Reply-To: References: Message-ID: > On 14 Aug 2023, at 10:39 AM, maitri ksh wrote: > > ? > Hi, > I need to solve an eigenvalue problem Ax=lmbda*x, where A=(B^-H)*Q*B^-1 is a hermitian matrix, 'B^-H' refers to the hermitian of the inverse of the matrix B. Theoretically it would take around 1.8TB to explicitly compute the matrix B^-1 . A feasible way to solve this eigenvalue problem would be to use the LU factors of the B matrix instead. So the problem looks something like this: > (((LU)^-H)*Q*(LU)^-1)*x = lmbda*x > For a guess value of the (normalised) eigen-vector 'x', > 1) one would require to solve two linear equations to get 'Ax', > (LU)*y=x, solve for 'y', > ((LU)^H)*z=Q*y, solve for 'z' > then one can follow the conventional power-iteration procedure > 2) update eigenvector: x= z/||z|| > 3) get eigenvalue using the Rayleigh quotient > 4) go to step-1 and loop through with a conditional break. > > Is there any example in petsc that does not require explicit declaration of the matrix 'A' (Ax=lmbda*x) and instead takes a vector 'Ax' as input for an iterative algorithm (like the one above). I looked into some of the examples of eigenvalue problems ( it's highly possible that I might have overlooked, I am new to petsc) but I couldn't find a way to circumvent the explicit declaration of matrix A. You could use SLEPc with a MatShell, that?s the very purpose of this MatType. Thanks, Pierre > Maitri > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Aug 14 04:50:56 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 14 Aug 2023 11:50:56 +0200 Subject: [petsc-users] eigenvalue problem involving inverse of a matrix In-Reply-To: References: Message-ID: <61755600-912A-4819-B7D0-C77ABE57A15C@dsic.upv.es> See for instance ex3.c and ex9.c https://slepc.upv.es/documentation/current/src/eps/tutorials/index.html Jose > El 14 ago 2023, a las 10:45, Pierre Jolivet escribi?: > > > >> On 14 Aug 2023, at 10:39 AM, maitri ksh wrote: >> >> ? >> Hi, >> I need to solve an eigenvalue problem Ax=lmbda*x, where A=(B^-H)*Q*B^-1 is a hermitian matrix, 'B^-H' refers to the hermitian of the inverse of the matrix B. Theoretically it would take around 1.8TB to explicitly compute the matrix B^-1 . A feasible way to solve this eigenvalue problem would be to use the LU factors of the B matrix instead. So the problem looks something like this: >> (((LU)^-H)*Q*(LU)^-1)*x = lmbda*x >> For a guess value of the (normalised) eigen-vector 'x', >> 1) one would require to solve two linear equations to get 'Ax', >> (LU)*y=x, solve for 'y', >> ((LU)^H)*z=Q*y, solve for 'z' >> then one can follow the conventional power-iteration procedure >> 2) update eigenvector: x= z/||z|| >> 3) get eigenvalue using the Rayleigh quotient >> 4) go to step-1 and loop through with a conditional break. >> >> Is there any example in petsc that does not require explicit declaration of the matrix 'A' (Ax=lmbda*x) and instead takes a vector 'Ax' as input for an iterative algorithm (like the one above). I looked into some of the examples of eigenvalue problems ( it's highly possible that I might have overlooked, I am new to petsc) but I couldn't find a way to circumvent the explicit declaration of matrix A. > > You could use SLEPc with a MatShell, that?s the very purpose of this MatType. > > Thanks, > Pierre > >> Maitri From maitri.ksh at gmail.com Mon Aug 14 05:20:15 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Mon, 14 Aug 2023 13:20:15 +0300 Subject: [petsc-users] eigenvalue problem involving inverse of a matrix In-Reply-To: <61755600-912A-4819-B7D0-C77ABE57A15C@dsic.upv.es> References: <61755600-912A-4819-B7D0-C77ABE57A15C@dsic.upv.es> Message-ID: got it, thanks Pierre & Jose. On Mon, Aug 14, 2023 at 12:50?PM Jose E. Roman wrote: > See for instance ex3.c and ex9.c > https://slepc.upv.es/documentation/current/src/eps/tutorials/index.html > > Jose > > > > El 14 ago 2023, a las 10:45, Pierre Jolivet > escribi?: > > > > > > > >> On 14 Aug 2023, at 10:39 AM, maitri ksh wrote: > >> > >> ? > >> Hi, > >> I need to solve an eigenvalue problem Ax=lmbda*x, where > A=(B^-H)*Q*B^-1 is a hermitian matrix, 'B^-H' refers to the hermitian of > the inverse of the matrix B. Theoretically it would take around 1.8TB to > explicitly compute the matrix B^-1 . A feasible way to solve this > eigenvalue problem would be to use the LU factors of the B matrix instead. > So the problem looks something like this: > >> (((LU)^-H)*Q*(LU)^-1)*x = lmbda*x > >> For a guess value of the (normalised) eigen-vector 'x', > >> 1) one would require to solve two linear equations to get 'Ax', > >> (LU)*y=x, solve for 'y', > >> ((LU)^H)*z=Q*y, solve for 'z' > >> then one can follow the conventional power-iteration procedure > >> 2) update eigenvector: x= z/||z|| > >> 3) get eigenvalue using the Rayleigh quotient > >> 4) go to step-1 and loop through with a conditional break. > >> > >> Is there any example in petsc that does not require explicit > declaration of the matrix 'A' (Ax=lmbda*x) and instead takes a vector 'Ax' > as input for an iterative algorithm (like the one above). I looked into > some of the examples of eigenvalue problems ( it's highly possible that I > might have overlooked, I am new to petsc) but I couldn't find a way to > circumvent the explicit declaration of matrix A. > > > > You could use SLEPc with a MatShell, that?s the very purpose of this > MatType. > > > > Thanks, > > Pierre > > > >> Maitri > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.kramer at imperial.ac.uk Mon Aug 14 10:00:53 2023 From: s.kramer at imperial.ac.uk (Stephan Kramer) Date: Mon, 14 Aug 2023 17:00:53 +0200 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: Message-ID: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> Many thanks for looking into this, Mark > My 3D tests were not that different and I see you lowered the threshold. > Note, you can set the threshold to zero, but your test is running so much > differently than mine there is something else going on. > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for > in 3D. > > So it is not clear what the problem is. Some questions: > > * do you have a picture of this mesh to show me? It's just a standard hexahedral cubed sphere mesh with the refinement level giving the number of times each of the six sides have been subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 = 98304? hexes. And everything doubles in all 3 dimensions (so 2^3) going to the next Level > * what do you mean by Q1-Q2 elements? Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity and (tri)linear for pressure I guess you could argue we could/should just do good old geometric multigrid instead. More generally we do use this solver configuration a lot for tetrahedral Taylor Hood (P2-P1) in particular also for our adaptive mesh runs - would it be worth to see if we have the same performance issues with tetrahedral P2-P1? > > It would be nice to see if the new and old codes are similar without > aggressive coarsening. > This was the intended change of the major change in this time frame as you > noticed. > If these jobs are easy to run, could you check that the old and new > versions are similar with "-pc_gamg_square_graph 0 ", ( and you only need > one time step). > All you need to do is check that the first coarse grid has about the same > number of equations (large). Unfortunately we're seeing some memory errors when we use this option, and I'm not entirely clear whether we're just running out of memory and need to put it on a special queue. The run with square_graph 0 using new PETSc managed to get through one solve at level 5, and is giving the following mg levels: ??????? rows=174, cols=174, bs=6 ????????? total: nonzeros=30276, allocated nonzeros=30276 -- ????????? rows=2106, cols=2106, bs=6 ????????? total: nonzeros=4238532, allocated nonzeros=4238532 -- ????????? rows=21828, cols=21828, bs=6 ????????? total: nonzeros=62588232, allocated nonzeros=62588232 -- ????????? rows=589824, cols=589824, bs=6 ????????? total: nonzeros=1082528928, allocated nonzeros=1082528928 -- ????????? rows=2433222, cols=2433222, bs=3 ????????? total: nonzeros=456526098, allocated nonzeros=456526098 comparing with square_graph 100 with new PETSc ????????? rows=96, cols=96, bs=6 ????????? total: nonzeros=9216, allocated nonzeros=9216 -- ????????? rows=1440, cols=1440, bs=6 ????????? total: nonzeros=647856, allocated nonzeros=647856 -- ????????? rows=97242, cols=97242, bs=6 ????????? total: nonzeros=65656836, allocated nonzeros=65656836 -- ????????? rows=2433222, cols=2433222, bs=3 ????????? total: nonzeros=456526098, allocated nonzeros=456526098 and old PETSc with square_graph 100 ????????? rows=90, cols=90, bs=6 ????????? total: nonzeros=8100, allocated nonzeros=8100 -- ????????? rows=1872, cols=1872, bs=6 ????????? total: nonzeros=1234080, allocated nonzeros=1234080 -- ????????? rows=47652, cols=47652, bs=6 ????????? total: nonzeros=23343264, allocated nonzeros=23343264 -- ????????? rows=2433222, cols=2433222, bs=3 ????????? total: nonzeros=456526098, allocated nonzeros=456526098 -- Unfortunately old PETSc with square_graph 0 did not complete a single solve before giving the memory error > > BTW, I am starting to think I should add the old method back as an option. > I did not think this change would cause large differences. Yes, I think that would be much appreciated. Let us know if we can do any testing Best wishes Stephan > > Thanks, > Mark > > > > >> Note that we are providing the rigid body near nullspace, >> hence the bs=3 to bs=6. >> We have tried different values for the gamg_threshold but it doesn't >> really seem to significantly alter the coarsening amount in that first >> step. >> >> Do you have any suggestions for further things we should try/look at? >> Any feedback would be much appreciated >> >> Best wishes >> Stephan Kramer >> >> Full logs including log_view timings available from >> https://github.com/stephankramer/petsc-scaling/ >> >> In particular: >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> >> From martin.diehl at kuleuven.be Mon Aug 14 11:05:30 2023 From: martin.diehl at kuleuven.be (Martin Diehl) Date: Mon, 14 Aug 2023 16:05:30 +0000 Subject: [petsc-users] DM/DS crash after update to 3.19 (Fortran) Message-ID: <05062f5e9b146d1361eece1d878d6a29626740cd.camel@kuleuven.be> Dear PETSc team, my simulation crashes after updating from 3.18.5 to 3.19.4. The error message is attached, so is the main code. The mesh (variable named geomMesh) is read with DMPlexCreateFromFile in a different part of the code).? I did not start serious debugging yet in the hope that you can point me into the right direction having recent changes in FE/DS in mind. If this does not ring a bell, I'll have a look with a PETSc debug build. many thanks in advance, Martin -- KU Leuven Department of Computer Science Department of Materials Engineering Celestijnenlaan 200a 3001 Leuven, Belgium -------------- next part -------------- [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Pointer: Parameter # 1 [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-mechanical_ksp_max_it value: 25 source: code [0]PETSC ERROR: Option left: name:-mechanical_ksp_type value: fgmres source: code [0]PETSC ERROR: Option left: name:-mechanical_mg_levels_ksp_type value: chebyshev source: code [0]PETSC ERROR: Option left: name:-mechanical_mg_levels_pc_type value: sor source: code [0]PETSC ERROR: Option left: name:-mechanical_pc_ml_nullspace value: user source: code [0]PETSC ERROR: Option left: name:-mechanical_pc_type value: ml source: code [0]PETSC ERROR: Option left: name:-mechanical_snes_ksp_ew (no value) source: code [0]PETSC ERROR: Option left: name:-mechanical_snes_ksp_ew_rtol0 value: 0.01 source: code [0]PETSC ERROR: Option left: name:-mechanical_snes_ksp_ew_rtolmax value: 0.01 source: code [0]PETSC ERROR: Option left: name:-mechanical_snes_linesearch_type value: cp source: code [0]PETSC ERROR: Option left: name:-mechanical_snes_type value: newtonls source: code [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.19.4, Jul 31, 2023 [0]PETSC ERROR: DAMASK_mesh on a named hp by m Mon Aug 14 17:45:41 2023 [0]PETSC ERROR: Configure options --prefix=/opt/petsc/linux-c-opt --with-shared-libraries=1 --with-petsc4py=1 --with-mpi-f90module-visibility=0 --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpicxx --with-fc=/usr/bin/mpifort --with-fftw=1 --with-hdf5=1 --with-hdf5-fortran-bindings=1 --with-metis=1 --with-parmetis=1 --with-mumps=1 --with-scalapack=1 --with-ptscotch=1 --with-ptscotch-lib="[libesmumps.so,libptscotch.so,libptscotcherr.so,libscotch.so,libscotcherr.so,libbz2.so]" --with-ptscotch-include= --with-suitesparse=1 --with-superlu-lib=-lsuperlu --with-superlu-include=/usr/include/superlu --with-superlu_dist-lib=-lsuperlu_dist --with-superlu_dist-include=/usr/include/superlu_dist --with-ml=1 --with-boost=1 --COPTFLAGS=-O3 -march=native --CXXOPTFLAGS=-O3 -march=native --FOPTFLAGS=-O3 -march=native [0]PETSC ERROR: #1 PetscQuadratureGetNumComponents() at /home/m/.cache/yay/petsc/src/petsc-3.19.4/src/dm/dt/interface/dt.c:252 [0]PETSC ERROR: #2 PetscFESetQuadrature() at /home/m/.cache/yay/petsc/src/petsc-3.19.4/src/dm/dt/fe/interface/fe.c:651 [0]PETSC ERROR: #3 PetscDSSetUp() at /home/m/.cache/yay/petsc/src/petsc-3.19.4/src/dm/dt/interface/dtds.c:446 [0]PETSC ERROR: #4 DMCreateDS() at /home/m/.cache/yay/petsc/src/petsc-3.19.4/src/dm/interface/dm.c:6003 [0]PETSC ERROR: #5 /home/m/DAMASK/src/mesh/mesh_mech_FEM.f90:177 -------------- next part -------------- A non-text attachment was scrubbed... Name: mesh_mech_FEM.f90 Type: text/x-fortran Size: 37382 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 659 bytes Desc: This is a digitally signed message part URL: From marcos.vanella at nist.gov Mon Aug 14 13:05:29 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 14 Aug 2023 18:05:29 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The batch script for slurm submission, ex60.log and gpu stats files are attached. Nothing stands out as wrong to me but please have a look. I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. Thanks! Marcos ________________________________ From: Junchao Zhang Sent: Friday, August 11, 2023 5:52 PM To: Vanella, Marcos (Fed) Cc: PETSc users list ; Satish Balay Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok. The comment at the end shows how to run it test: requires: cuda suffix: 1_cuda nsize: 4 args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse --Junchao Zhang On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3. I tried the case again, 2 CPUs and one GPU and get this error now: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 srun: error: enki12: tasks 0-1: Aborted (core dumped) This was the slurm submission script in this case: #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=debug #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 # PETSc dir and arch: export PETSC_DIR=/home/mnv/Software/petsc export PETSC_ARCH=arch-linux-c-dbg # SYSTEM name: export MYSYSTEM=enki # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg The configure.log for the PETSc build is attached. Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : 0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: GPU error [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: GPU error [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda ... I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver. Let me know if you have any ideas as to what is happening. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:35 PM To: Vanella, Marcos (Fed) >; PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it. I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error: .... CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) | 1 Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit interface /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: 38 | import tIS | 1 Error: IMPORT statement at (1) only permitted in an INTERFACE body /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: 39 | PetscInt a ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: 40 | PetscInt b ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: 41 | PetscInt c ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: 42 | PetscInt d ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: 43 | PetscInt e ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: 44 | PetscInt f ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: 45 | PetscInt g ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: 46 | IS h ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: 47 | IS i ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: 48 | PetscErrorCode z | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: 49 | end subroutine PCASMCreateSubdomains2D | 1 Error: Expecting END INTERFACE statement at (1) make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o make[3]: Leaving directory '/home/mnv/Software/petsc' make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 make[2]: Leaving directory '/home/mnv/Software/petsc' **************************ERROR************************************* Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** make[1]: *** [makefile:45: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:04 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: batch_script.sh Type: application/x-sh Size: 750 bytes Desc: batch_script.sh URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex60.log Type: application/octet-stream Size: 2034 bytes Desc: ex60.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpu-stats-enki11.adlp-199160.out Type: application/octet-stream Size: 2726 bytes Desc: gpu-stats-enki11.adlp-199160.out URL: From junchao.zhang at gmail.com Mon Aug 14 14:24:50 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 14 Aug 2023 14:24:50 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Yeah, it looks like ex60 was run correctly. Double check your code again and if you still run into errors, we can try to reproduce on our end. Thanks. --Junchao Zhang On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Aug 14 15:26:26 2023 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 14 Aug 2023 16:26:26 -0400 Subject: [petsc-users] performance regression with GAMG In-Reply-To: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> References: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> Message-ID: On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer wrote: > Many thanks for looking into this, Mark > > My 3D tests were not that different and I see you lowered the threshold. > > Note, you can set the threshold to zero, but your test is running so much > > differently than mine there is something else going on. > > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for > > in 3D. > > > > So it is not clear what the problem is. Some questions: > > > > * do you have a picture of this mesh to show me? > > It's just a standard hexahedral cubed sphere mesh with the refinement > level giving the number of times each of the six sides have been > subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 > layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 = > 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) going > to the next Level > I see, and I assume these are pretty stretched elements. > > > * what do you mean by Q1-Q2 elements? > > Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity > and (tri)linear for pressure > > I guess you could argue we could/should just do good old geometric > multigrid instead. More generally we do use this solver configuration a > lot for tetrahedral Taylor Hood (P2-P1) in particular also for our > adaptive mesh runs - would it be worth to see if we have the same > performance issues with tetrahedral P2-P1? > No, you have a clear reproducer, if not minimal. The first coarsening is very different. I am working on this and I see that I added a heuristic for thin bodies where you order the vertices in greedy algorithms with minimum degree first. This will tend to pick corners first, edges then faces, etc. That may be the problem. I would like to understand it better (see below). > > > > It would be nice to see if the new and old codes are similar without > > aggressive coarsening. > > This was the intended change of the major change in this time frame as > you > > noticed. > > If these jobs are easy to run, could you check that the old and new > > versions are similar with "-pc_gamg_square_graph 0 ", ( and you only > need > > one time step). > > All you need to do is check that the first coarse grid has about the same > > number of equations (large). > Unfortunately we're seeing some memory errors when we use this option, > and I'm not entirely clear whether we're just running out of memory and > need to put it on a special queue. > > The run with square_graph 0 using new PETSc managed to get through one > solve at level 5, and is giving the following mg levels: > > rows=174, cols=174, bs=6 > total: nonzeros=30276, allocated nonzeros=30276 > -- > rows=2106, cols=2106, bs=6 > total: nonzeros=4238532, allocated nonzeros=4238532 > -- > rows=21828, cols=21828, bs=6 > total: nonzeros=62588232, allocated nonzeros=62588232 > -- > rows=589824, cols=589824, bs=6 > total: nonzeros=1082528928, allocated nonzeros=1082528928 > -- > rows=2433222, cols=2433222, bs=3 > total: nonzeros=456526098, allocated nonzeros=456526098 > > comparing with square_graph 100 with new PETSc > > rows=96, cols=96, bs=6 > total: nonzeros=9216, allocated nonzeros=9216 > -- > rows=1440, cols=1440, bs=6 > total: nonzeros=647856, allocated nonzeros=647856 > -- > rows=97242, cols=97242, bs=6 > total: nonzeros=65656836, allocated nonzeros=65656836 > -- > rows=2433222, cols=2433222, bs=3 > total: nonzeros=456526098, allocated nonzeros=456526098 > > and old PETSc with square_graph 100 > > rows=90, cols=90, bs=6 > total: nonzeros=8100, allocated nonzeros=8100 > -- > rows=1872, cols=1872, bs=6 > total: nonzeros=1234080, allocated nonzeros=1234080 > -- > rows=47652, cols=47652, bs=6 > total: nonzeros=23343264, allocated nonzeros=23343264 > -- > rows=2433222, cols=2433222, bs=3 > total: nonzeros=456526098, allocated nonzeros=456526098 > -- > > Unfortunately old PETSc with square_graph 0 did not complete a single > solve before giving the memory error > OK, thanks for trying. I am working on this and I will give you a branch to test, but if you can rebuild PETSc here is a quick test that might fix your problem. In src/ksp/pc/impls/gamg/agg.c you will see: PetscCall(PetscSortIntWithArray(nloc, degree, permute)); If you can comment this out in the new code and compare with the old, that might fix the problem. Thanks, Mark > > > > > BTW, I am starting to think I should add the old method back as an > option. > > I did not think this change would cause large differences. > > Yes, I think that would be much appreciated. Let us know if we can do > any testing > > Best wishes > Stephan > > > > > > Thanks, > > Mark > > > > > > > > > >> Note that we are providing the rigid body near nullspace, > >> hence the bs=3 to bs=6. > >> We have tried different values for the gamg_threshold but it doesn't > >> really seem to significantly alter the coarsening amount in that first > >> step. > >> > >> Do you have any suggestions for further things we should try/look at? > >> Any feedback would be much appreciated > >> > >> Best wishes > >> Stephan Kramer > >> > >> Full logs including log_view timings available from > >> https://github.com/stephankramer/petsc-scaling/ > >> > >> In particular: > >> > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Aug 14 15:37:30 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 14 Aug 2023 15:37:30 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: I don't see a problem in the matrix assembly. If you point me to your repo and show me how to build it, I can try to reproduce. --Junchao Zhang On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type > asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as > (I understand) is done in the ex60. The error is always the same, so it > seems it is not related to ksp,pc. Indeed it seems to happen when trying to > offload the Matrix to the GPU: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > ... > #8 0x20003935fc6b in ??? > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11edb287 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ* > ... > ... > > This is the piece of fortran code I have doing this within my Poisson > solver: > > ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag > blocks nonzeros per row to 5. > CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL% > NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& > 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS% > A_H,PETSC_IERR) > CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) > DO IROW=1,ZSL%NUNKH_LOCAL > DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) > ! PETSC expects zero based indexes.1,Global I position (zero > base),1,Global J position (zero base) > CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1 > ,ZSL%JD_MAT_H(JCOL,IROW)-1,& > ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) > ENDDO > ENDDO > CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > > Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), > and add nonzero values one by one. I wonder if there is something related > to this that the copying to GPU does not like. > Thanks, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 3:24 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Yeah, it looks like ex60 was run correctly. > Double check your code again and if you still run into errors, we can try > to reproduce on our end. > > Thanks. > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Aug 14 16:45:00 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 14 Aug 2023 21:45:00 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: All right Junchao, thank you for looking at this! So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc env variables: # PETSc dir and arch, set MYSYS to nisaba dor FDS: export PETSC_DIR=/dir_to_petsc/petsc export PETSC_ARCH=arch-linux-c-dbg export MYSYSTEM=nisaba and configured the library with: $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda Then made and checked the PETSc build. Then for FDS: 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX branch: $ cd ~/fds_dir $ git clone https://github.com/marcosvanella/fds.git $ cd fds $ git checkout FireX 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a debug target for fds (this is with cuda enabled openmpi compiled with gcc, in my case gcc-11.2 + PETSc): $ cd Build/ompi_gnu_linux_db $./make_fds.sh You should see compilation lines like this, with the WITH_PETSC Preprocessor variable being defined: Building ompi_gnu_linux_db mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 ... ... If you are compiling on a Power9 node you might come across this error right off the bat: ../../Source/prec.f90:34:8: 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A very small number 16 byte accuracy | 1 Error: Kind -3 not supported for type REAL at (1) which means for some reason gcc in the Power9 does not like quad precision definition in this manner. A way around it is to add the intrinsic Fortran2008 module iso_fortran_env: use, intrinsic :: iso_fortran_env in the fds/Source/prec.f90 file and change the quad precision denominator to: INTEGER, PARAMETER :: QB = REAL128 in there. We are investigating the reason why this is happening. This is not related to Petsc in the code, everything related to PETSc calls is integers and double precision reals. After the code compiles you get the executable in ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db With which you can run the attached 2 mesh case as: $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE which is what I was testing in CPU. This is the result I get from the previous submission in an interactive job in Enki (similar with batch submissions, gmres ksp, gamg pc): Starting FDS ... MPI Process 1 started on enki11.adlp MPI Process 0 started on enki11.adlp Reading FDS input file ... WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: DEVC Device is not within any mesh. Fire Dynamics Simulator Current Date : August 14, 2023 17:26:22 Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX Revision Date : Mon Aug 14 17:07:20 2023 -0400 Compiler : Gnu gfortran 11.2.1 Compilation Date : Aug 14, 2023 17:11:05 MPI Enabled; Number of MPI Processes: 2 OpenMP Enabled; Number of OpenMP Threads: 1 MPI version: 3.1 MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 Job TITLE : Job ID string : test terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Seems the issue stems from the call to KSPSOLVE, line 3130 in fds/Source/pres.f90. Well, thank you for taking the time to look at this and also let me know if these threads should be moved to the issue tracker, or other venue. Best, Marcos ________________________________ From: Junchao Zhang Sent: Monday, August 14, 2023 4:37 PM To: Vanella, Marcos (Fed) ; PETSc users list Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU I don't see a problem in the matrix assembly. If you point me to your repo and show me how to build it, I can try to reproduce. --Junchao Zhang On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as (I understand) is done in the ex60. The error is always the same, so it seems it is not related to ksp,pc. Indeed it seems to happen when trying to offload the Matrix to the GPU: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? ... #8 0x20003935fc6b in ??? #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11edb287 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11edb287 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edfd1b in MatSeqAIJCUSPARSEGetIJ ... ... This is the piece of fortran code I have doing this within my Poisson solver: ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag blocks nonzeros per row to 5. CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL%NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS%A_H,PETSC_IERR) CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) DO IROW=1,ZSL%NUNKH_LOCAL DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) ! PETSC expects zero based indexes.1,Global I position (zero base),1,Global J position (zero base) CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1,ZSL%JD_MAT_H(JCOL,IROW)-1,& ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) ENDDO ENDDO CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), and add nonzero values one by one. I wonder if there is something related to this that the copying to GPU does not like. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 3:24 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Yeah, it looks like ex60 was run correctly. Double check your code again and if you still run into errors, we can try to reproduce on our end. Thanks. --Junchao Zhang On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The batch script for slurm submission, ex60.log and gpu stats files are attached. Nothing stands out as wrong to me but please have a look. I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. Thanks! Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 5:52 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok. The comment at the end shows how to run it test: requires: cuda suffix: 1_cuda nsize: 4 args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse --Junchao Zhang On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3. I tried the case again, 2 CPUs and one GPU and get this error now: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 srun: error: enki12: tasks 0-1: Aborted (core dumped) This was the slurm submission script in this case: #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=debug #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 # PETSc dir and arch: export PETSC_DIR=/home/mnv/Software/petsc export PETSC_ARCH=arch-linux-c-dbg # SYSTEM name: export MYSYSTEM=enki # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg The configure.log for the PETSc build is attached. Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : 0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: GPU error [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: GPU error [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda ... I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver. Let me know if you have any ideas as to what is happening. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:35 PM To: Vanella, Marcos (Fed) >; PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it. I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error: .... CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) | 1 Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit interface /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: 38 | import tIS | 1 Error: IMPORT statement at (1) only permitted in an INTERFACE body /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: 39 | PetscInt a ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: 40 | PetscInt b ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: 41 | PetscInt c ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: 42 | PetscInt d ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: 43 | PetscInt e ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: 44 | PetscInt f ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: 45 | PetscInt g ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: 46 | IS h ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: 47 | IS i ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: 48 | PetscErrorCode z | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: 49 | end subroutine PCASMCreateSubdomains2D | 1 Error: Expecting END INTERFACE statement at (1) make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o make[3]: Leaving directory '/home/mnv/Software/petsc' make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 make[2]: Leaving directory '/home/mnv/Software/petsc' **************************ERROR************************************* Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** make[1]: *** [makefile:45: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:04 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Aug 14 16:52:53 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 14 Aug 2023 21:52:53 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Attached is the test.fds test case. Thanks! ________________________________ From: Vanella, Marcos (Fed) Sent: Monday, August 14, 2023 5:45 PM To: Junchao Zhang ; petsc-users at mcs.anl.gov ; Satish Balay Cc: McDermott, Randall J. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU All right Junchao, thank you for looking at this! So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc env variables: # PETSc dir and arch, set MYSYS to nisaba dor FDS: export PETSC_DIR=/dir_to_petsc/petsc export PETSC_ARCH=arch-linux-c-dbg export MYSYSTEM=nisaba and configured the library with: $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda Then made and checked the PETSc build. Then for FDS: 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX branch: $ cd ~/fds_dir $ git clone https://github.com/marcosvanella/fds.git $ cd fds $ git checkout FireX 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a debug target for fds (this is with cuda enabled openmpi compiled with gcc, in my case gcc-11.2 + PETSc): $ cd Build/ompi_gnu_linux_db $./make_fds.sh You should see compilation lines like this, with the WITH_PETSC Preprocessor variable being defined: Building ompi_gnu_linux_db mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 ... ... If you are compiling on a Power9 node you might come across this error right off the bat: ../../Source/prec.f90:34:8: 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A very small number 16 byte accuracy | 1 Error: Kind -3 not supported for type REAL at (1) which means for some reason gcc in the Power9 does not like quad precision definition in this manner. A way around it is to add the intrinsic Fortran2008 module iso_fortran_env: use, intrinsic :: iso_fortran_env in the fds/Source/prec.f90 file and change the quad precision denominator to: INTEGER, PARAMETER :: QB = REAL128 in there. We are investigating the reason why this is happening. This is not related to Petsc in the code, everything related to PETSc calls is integers and double precision reals. After the code compiles you get the executable in ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db With which you can run the attached 2 mesh case as: $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE which is what I was testing in CPU. This is the result I get from the previous submission in an interactive job in Enki (similar with batch submissions, gmres ksp, gamg pc): Starting FDS ... MPI Process 1 started on enki11.adlp MPI Process 0 started on enki11.adlp Reading FDS input file ... WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: DEVC Device is not within any mesh. Fire Dynamics Simulator Current Date : August 14, 2023 17:26:22 Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX Revision Date : Mon Aug 14 17:07:20 2023 -0400 Compiler : Gnu gfortran 11.2.1 Compilation Date : Aug 14, 2023 17:11:05 MPI Enabled; Number of MPI Processes: 2 OpenMP Enabled; Number of OpenMP Threads: 1 MPI version: 3.1 MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 Job TITLE : Job ID string : test terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Seems the issue stems from the call to KSPSOLVE, line 3130 in fds/Source/pres.f90. Well, thank you for taking the time to look at this and also let me know if these threads should be moved to the issue tracker, or other venue. Best, Marcos ________________________________ From: Junchao Zhang Sent: Monday, August 14, 2023 4:37 PM To: Vanella, Marcos (Fed) ; PETSc users list Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU I don't see a problem in the matrix assembly. If you point me to your repo and show me how to build it, I can try to reproduce. --Junchao Zhang On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as (I understand) is done in the ex60. The error is always the same, so it seems it is not related to ksp,pc. Indeed it seems to happen when trying to offload the Matrix to the GPU: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? ... #8 0x20003935fc6b in ??? #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11edb287 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11edb287 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edfd1b in MatSeqAIJCUSPARSEGetIJ ... ... This is the piece of fortran code I have doing this within my Poisson solver: ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag blocks nonzeros per row to 5. CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL%NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS%A_H,PETSC_IERR) CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) DO IROW=1,ZSL%NUNKH_LOCAL DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) ! PETSC expects zero based indexes.1,Global I position (zero base),1,Global J position (zero base) CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1,ZSL%JD_MAT_H(JCOL,IROW)-1,& ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) ENDDO ENDDO CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), and add nonzero values one by one. I wonder if there is something related to this that the copying to GPU does not like. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 3:24 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Yeah, it looks like ex60 was run correctly. Double check your code again and if you still run into errors, we can try to reproduce on our end. Thanks. --Junchao Zhang On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The batch script for slurm submission, ex60.log and gpu stats files are attached. Nothing stands out as wrong to me but please have a look. I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. Thanks! Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 5:52 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok. The comment at the end shows how to run it test: requires: cuda suffix: 1_cuda nsize: 4 args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse --Junchao Zhang On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3. I tried the case again, 2 CPUs and one GPU and get this error now: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 srun: error: enki12: tasks 0-1: Aborted (core dumped) This was the slurm submission script in this case: #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=debug #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 # PETSc dir and arch: export PETSC_DIR=/home/mnv/Software/petsc export PETSC_ARCH=arch-linux-c-dbg # SYSTEM name: export MYSYSTEM=enki # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg The configure.log for the PETSc build is attached. Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : 0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: GPU error [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: GPU error [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda ... I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver. Let me know if you have any ideas as to what is happening. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:35 PM To: Vanella, Marcos (Fed) >; PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it. I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error: .... CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) | 1 Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit interface /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: 38 | import tIS | 1 Error: IMPORT statement at (1) only permitted in an INTERFACE body /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: 39 | PetscInt a ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: 40 | PetscInt b ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: 41 | PetscInt c ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: 42 | PetscInt d ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: 43 | PetscInt e ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: 44 | PetscInt f ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: 45 | PetscInt g ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: 46 | IS h ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: 47 | IS i ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: 48 | PetscErrorCode z | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: 49 | end subroutine PCASMCreateSubdomains2D | 1 Error: Expecting END INTERFACE statement at (1) make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o make[3]: Leaving directory '/home/mnv/Software/petsc' make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 make[2]: Leaving directory '/home/mnv/Software/petsc' **************************ERROR************************************* Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** make[1]: *** [makefile:45: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:04 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.fds Type: application/octet-stream Size: 1245 bytes Desc: test.fds URL: From knepley at gmail.com Mon Aug 14 16:57:45 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 15 Aug 2023 07:57:45 +1000 Subject: [petsc-users] DM/DS crash after update to 3.19 (Fortran) In-Reply-To: <05062f5e9b146d1361eece1d878d6a29626740cd.camel@kuleuven.be> References: <05062f5e9b146d1361eece1d878d6a29626740cd.camel@kuleuven.be> Message-ID: On Tue, Aug 15, 2023 at 2:06?AM Martin Diehl wrote: > Dear PETSc team, > > my simulation crashes after updating from 3.18.5 to 3.19.4. > > The error message is attached, so is the main code. The mesh (variable > named geomMesh) is read with DMPlexCreateFromFile in a different part > of the code). > I did not start serious debugging yet in the hope that you can point me > into the right direction having recent changes in FE/DS in mind. > > If this does not ring a bell, I'll have a look with a PETSc debug > I put in code to force all fields to use the same quadrature by default since many people had problems with this. For some reason, it does not guess the quadrature for your mesh correctly. You can turn this off using -petscds_force_quad 0 Thanks, Matt > build. > > many thanks in advance, > Martin > -- > KU Leuven > Department of Computer Science > Department of Materials Engineering > Celestijnenlaan 200a > 3001 Leuven, Belgium > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Mon Aug 14 17:37:36 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 14 Aug 2023 22:37:36 +0000 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Message-ID: PETSs is not necessarily faster than scipy for your problem when executed in serial. But you get benefits when running in parallel. The PETSc code you wrote uses float64 while your scipy code uses complex128, so the comparison may not be fair. In addition, using the RHS Jacobian does not necessarily make your PETSc code slower. In your case, the bottleneck is the matrix operations. For best performance, you should avoid adding two sparse matrices (especially with different sparsity patterns) which is very costly. So one MatMult + one MultAdd is the best option. MatAXPY with the same nonzero pattern would be a bit slower but still faster than MatAXPY with subset nonzero pattern, which you used in the Jacobian function. I echo Barry?s suggestion that debugging should be turned off before you do any performance study. Hong (Mr.) On Aug 10, 2023, at 4:40 AM, Niclas G?tting wrote: Thank you both for the very quick answer! So far, I compiled PETSc with debugging turned on, but I think it should still be faster than standard scipy in both cases. Actually, Stefano's answer has got me very far already; now I only define the RHS of the ODE and no Jacobian (I wonder, why the documentation suggests otherwise, though). I had the following four tries at implementing the RHS: 1. def rhsfunc1(ts, t, u, F): scale = 0.5 * (5 < t < 10) (l + scale * pump).mult(u, F) 2. def rhsfunc2(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) (scale * pump).multAdd(u, F, F) 3. def rhsfunc3(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) if scale != 0: pump.scale(scale) pump.multAdd(u, F, F) pump.scale(1/scale) 4. def rhsfunc4(ts, t, u, F): tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() l.mult(u, F) scale = 0.5 * (5 < t < 10) tmp_pump.axpy(scale, pump, structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) tmp_pump.multAdd(u, F, F) They all yield the same results, but with 50it/s, 800it/, 2300it/s and 1900it/s, respectively, which is a huge performance boost (almost 7 times as fast as scipy, with PETSc debugging still turned on). As the scale function will most likely be a gaussian in the future, I think that option 3 will be become numerically unstable and I'll have to go with option 4, which is already faster than I expected. If you think it is possible to speed up the RHS calculation even more, I'd be happy to hear your suggestions; the -log_view is attached to this message. One last point: If I didn't misunderstand the documentation at https://petsc.org/release/manual/ts/#special-cases, should this maybe be changed? Best regards Niclas On 09.08.23 17:51, Stefano Zampini wrote: TSRK is an explicit solver. Unless you are changing the ts type from command line, the explicit jacobian should not be needed. On top of Barry's suggestion, I would suggest you to write the explicit RHS instead of assembly a throw away matrix every time that function needs to be sampled. On Wed, Aug 9, 2023, 17:09 Niclas G?tting > wrote: Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will be well suited for PETSc. Currently, I solve the ODE with the following procedure in scipy (I can provide the necessary data files, if needed, but they are just some trace-preserving, very sparse matrices): import numpy as np import scipy.sparse import scipy.integrate from tqdm import tqdm l = np.load("../liouvillian.npy") pump = np.load("../pump_operator.npy") state = np.load("../initial_state.npy") l = scipy.sparse.csr_array(l) pump = scipy.sparse.csr_array(pump) def f(t, y, *args): return (l + 0.5 * (5 < t < 10) * pump) @ y #return l @ y # Uncomment for f(t) = 0 dt = 0.1 NUM_STEPS = 200 res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) solver = scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) times = [] for i in tqdm(range(NUM_STEPS)): res[i, :] = solver.integrate(solver.t + dt) times.append(solver.t) Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports about 330it/s on my machine. When converting the code to PETSc, I came to the following result (according to the chapter https://petsc.org/main/manual/ts/#special-cases) import sys import petsc4py petsc4py.init(args=sys.argv) import numpy as np import scipy.sparse from tqdm import tqdm from petsc4py import PETSc comm = PETSc.COMM_WORLD def mat_to_real(arr): return np.block([[arr.real, -arr.imag], [arr.imag, arr.real]]).astype(np.float64) def mat_to_petsc_aij(arr): arr_sc_sp = scipy.sparse.csr_array(arr) mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) rstart, rend = mat.getOwnershipRange() print(rstart, rend) print(arr.shape[0]) print(mat.sizes) I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] print(I.shape, J.shape, V.shape) mat.setValuesCSR(I, J, V) mat.assemble() return mat l = np.load("../liouvillian.npy") l = mat_to_real(l) pump = np.load("../pump_operator.npy") pump = mat_to_real(pump) state = np.load("../initial_state.npy") state = np.hstack([state.real, state.imag]).astype(np.float64) l = mat_to_petsc_aij(l) pump = mat_to_petsc_aij(pump) jac = l.duplicate() for i in range(8192): jac.setValue(i, i, 0) jac.assemble() jac += l vec = l.createVecRight() vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) vec.assemble() dt = 0.1 ts = PETSc.TS().create(comm=comm) ts.setFromOptions() ts.setProblemType(ts.ProblemType.LINEAR) ts.setEquationType(ts.EquationType.ODE_EXPLICIT) ts.setType(ts.Type.RK) ts.setRKType(ts.RKType.RK3BS) ts.setTime(0) print("KSP:", ts.getKSP().getType()) print("KSP PC:",ts.getKSP().getPC().getType()) print("SNES :", ts.getSNES().getType()) def jacobian(ts, t, u, Amat, Pmat): Amat.zeroEntries() Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) Amat.axpy(0.5 * (5 < t < 10), pump, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # Uncomment for f(t) = 0 ts.setRHSJacobian(jacobian, jac) NUM_STEPS = 200 res = np.empty((NUM_STEPS, 8192), dtype=np.float64) times = [] rstart, rend = vec.getOwnershipRange() for i in tqdm(range(NUM_STEPS)): time = ts.getTime() ts.setMaxTime(time + dt) ts.solve(vec) res[i, rstart:rend] = vec.getArray()[:] times.append(time) I decomposed the complex ODE into a larger real ODE, so that I can easily switch maybe to GPU computation later on. Now, the solutions of both scripts are very much identical, but PETSc runs about 3 times slower at 120it/s on my machine. I don't use MPI for PETSc yet. I strongly suppose that the problem lies within the jacobian definition, as PETSc is about 3 times *faster* than scipy with f(t) = 0 and therefore a constant jacobian. Thank you in advance. All the best, Niclas -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Aug 14 18:01:13 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 14 Aug 2023 18:01:13 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Marcos, These are my findings. I successfully ran the test in the end. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view Starting FDS ... ... [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to enable PETSc device support, for example, in some cases, -vec_type cuda Now I get why you met errors with "CPU runs". You configured and built hypre with petsc. Since you added --with-cuda, petsc would configure hypre with its GPU support. However, hypre has a limit/shortcoming that if it is configured with GPU support, you must pass GPU vectors to it. Thus the error. In other words, if you remove --with-cuda, you should be able to run above command. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda Starting FDS ... MPI Process 0 started on hong-gce-workstation MPI Process 1 started on hong-gce-workstation Reading FDS input file ... At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 .. Fire Dynamics Simulator ... STOP: FDS completed successfully (CHID: test) I guess there were link problems in your makefile. Actually, in the first try, I failed with mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC -I"/home/jczhang/petsc/include/" -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o init.o dump.o read.o divg.o main.o -Wl,-rpath -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ -L/usr/lib64 -lX11 /usr/bin/ld: cannot find -lflapack: No such file or directory /usr/bin/ld: cannot find -lfblas: No such file or directory collect2: error: ld returned 1 exit status make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 That is because you hardwired many link flags in your fds/Build/makefile. Then I changed LFLAGS_PETSC to LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc and everything worked. Could you also try it? --Junchao Zhang On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Attached is the test.fds test case. Thanks! > ------------------------------ > *From:* Vanella, Marcos (Fed) > *Sent:* Monday, August 14, 2023 5:45 PM > *To:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Satish Balay > *Cc:* McDermott, Randall J. (Fed) > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > All right Junchao, thank you for looking at this! > > So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc > env variables: > > # PETSc dir and arch, set MYSYS to nisaba dor FDS: > export PETSC_DIR=/dir_to_petsc/petsc > export PETSC_ARCH=arch-linux-c-dbg > export MYSYSTEM=nisaba > > and configured the library with: > > $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" > FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes > --with-shared-libraries=0 --download-suitesparse --download-hypre > --download-fblaslapack --with-cuda > > Then made and checked the PETSc build. > > Then for FDS: > > 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX > branch: > > $ cd ~/fds_dir > $ git clone https://github.com/marcosvanella/fds.git > $ cd fds > $ git checkout FireX > > > 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a > debug target for fds (this is with cuda enabled openmpi compiled with gcc, > in my case gcc-11.2 + PETSc): > > $ cd Build/ompi_gnu_linux_db > $./make_fds.sh > > You should see compilation lines like this, with the WITH_PETSC > Preprocessor variable being defined: > > Building ompi_gnu_linux_db > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 > ... > ... > > If you are compiling on a Power9 node you might come across this error > right off the bat: > > ../../Source/prec.f90:34:8: > > 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A > very small number 16 byte accuracy > | 1 > Error: Kind -3 not supported for type REAL at (1) > > which means for some reason gcc in the Power9 does not like quad precision > definition in this manner. A way around it is to add the intrinsic > Fortran2008 module iso_fortran_env: > > use, intrinsic :: iso_fortran_env > > in the fds/Source/prec.f90 file and change the quad precision denominator > to: > > INTEGER, PARAMETER :: QB = REAL128 > > in there. We are investigating the reason why this is happening. This is > not related to Petsc in the code, everything related to PETSc calls is > integers and double precision reals. > > After the code compiles you get the executable in > ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > > With which you can run the attached 2 mesh case as: > > $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view > > and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE > which is what I was testing in CPU. This is the result I get from the > previous submission in an interactive job in Enki (similar with batch > submissions, gmres ksp, gamg pc): > > > Starting FDS ... > > MPI Process 1 started on enki11.adlp > MPI Process 0 started on enki11.adlp > > Reading FDS input file ... > > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: DEVC Device is not within any mesh. > > Fire Dynamics Simulator > > Current Date : August 14, 2023 17:26:22 > Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX > Revision Date : Mon Aug 14 17:07:20 2023 -0400 > Compiler : Gnu gfortran 11.2.1 > Compilation Date : Aug 14, 2023 17:11:05 > > MPI Enabled; Number of MPI Processes: 2 > OpenMP Enabled; Number of OpenMP Threads: 1 > > MPI version: 3.1 > MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp > Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 > > Job TITLE : > Job ID string : test > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > *#19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #*19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > Seems the issue stems from the call to KSPSOLVE, line 3130 in > fds/Source/pres.f90. > > Well, thank you for taking the time to look at this and also let me know > if these threads should be moved to the issue tracker, or other venue. > Best, > Marcos > > > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 4:37 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > I don't see a problem in the matrix assembly. > If you point me to your repo and show me how to build it, I can try to > reproduce. > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type > asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as > (I understand) is done in the ex60. The error is always the same, so it > seems it is not related to ksp,pc. Indeed it seems to happen when trying to > offload the Matrix to the GPU: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > ... > #8 0x20003935fc6b in ??? > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11edb287 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ* > ... > ... > > This is the piece of fortran code I have doing this within my Poisson > solver: > > ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag > blocks nonzeros per row to 5. > CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL% > NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& > 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS% > A_H,PETSC_IERR) > CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) > DO IROW=1,ZSL%NUNKH_LOCAL > DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) > ! PETSC expects zero based indexes.1,Global I position (zero > base),1,Global J position (zero base) > CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1 > ,ZSL%JD_MAT_H(JCOL,IROW)-1,& > ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) > ENDDO > ENDDO > CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > > Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), > and add nonzero values one by one. I wonder if there is something related > to this that the copying to GPU does not like. > Thanks, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 3:24 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Yeah, it looks like ex60 was run correctly. > Double check your code again and if you still run into errors, we can try > to reproduce on our end. > > Thanks. > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Tue Aug 15 04:21:13 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Tue, 15 Aug 2023 12:21:13 +0300 Subject: [petsc-users] eigenvalue problem involving inverse of a matrix In-Reply-To: References: <61755600-912A-4819-B7D0-C77ABE57A15C@dsic.upv.es> Message-ID: I used 'Matshell' and associated it with a custom-defined matrix-vector multiplication and used it to solve the eigenvalue problem ((*((LU)^-H)*Q** *(LU)^-1)***x = lmbda*x,* where LU is the factor of a *matrix **B*). I compared the eigenvalue results with matlab, however in matlab, I computed the matrix *A=(B^-H)*Q*B^-1* directly and used eig(A). Here are the results: petsc('*eigVal.png'*): (method: krylovschur) lmbd1 = 22.937184 lmbd2 = -6.306099 lmbd3 = 2.904980 lmbd4 = 0.026435 Matlab: lmbd1 = 0.0021 lmbd2 = 0.0840 lmbd3 = 3.9060 lmbd4 = 22.7579 It appears that the iterative procedure that I have adopted (in petsc) is accurate only for the largest eigenvalue. Is this correct? or is it due to some error in my code? Also, I tried using shift-invert-strategy ('*code_snippet_sinvert.png'*) to see if I can get accurate non-largest eigenvalue, but it throws error (' *error.png'*) related to '*MatSolverType mumps does not support matrix type shell*', and it gives the same error message with petsc's native MATSOLVERSUPERLU. How to resolve this? On Mon, Aug 14, 2023 at 1:20?PM maitri ksh wrote: > got it, thanks Pierre & Jose. > > On Mon, Aug 14, 2023 at 12:50?PM Jose E. Roman wrote: > >> See for instance ex3.c and ex9.c >> https://slepc.upv.es/documentation/current/src/eps/tutorials/index.html >> >> Jose >> >> >> > El 14 ago 2023, a las 10:45, Pierre Jolivet >> escribi?: >> > >> > >> > >> >> On 14 Aug 2023, at 10:39 AM, maitri ksh wrote: >> >> >> >> ? >> >> Hi, >> >> I need to solve an eigenvalue problem Ax=lmbda*x, where >> A=(B^-H)*Q*B^-1 is a hermitian matrix, 'B^-H' refers to the hermitian of >> the inverse of the matrix B. Theoretically it would take around 1.8TB to >> explicitly compute the matrix B^-1 . A feasible way to solve this >> eigenvalue problem would be to use the LU factors of the B matrix instead. >> So the problem looks something like this: >> >> (((LU)^-H)*Q*(LU)^-1)*x = lmbda*x >> >> For a guess value of the (normalised) eigen-vector 'x', >> >> 1) one would require to solve two linear equations to get 'Ax', >> >> (LU)*y=x, solve for 'y', >> >> ((LU)^H)*z=Q*y, solve for 'z' >> >> then one can follow the conventional power-iteration procedure >> >> 2) update eigenvector: x= z/||z|| >> >> 3) get eigenvalue using the Rayleigh quotient >> >> 4) go to step-1 and loop through with a conditional break. >> >> >> >> Is there any example in petsc that does not require explicit >> declaration of the matrix 'A' (Ax=lmbda*x) and instead takes a vector 'Ax' >> as input for an iterative algorithm (like the one above). I looked into >> some of the examples of eigenvalue problems ( it's highly possible that I >> might have overlooked, I am new to petsc) but I couldn't find a way to >> circumvent the explicit declaration of matrix A. >> > >> > You could use SLEPc with a MatShell, that?s the very purpose of this >> MatType. >> > >> > Thanks, >> > Pierre >> > >> >> Maitri >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: eigVal.png Type: image/png Size: 11110 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: code_snippet_sinvert.png Type: image/png Size: 29508 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: error.png Type: image/png Size: 72446 bytes Desc: not available URL: From jroman at dsic.upv.es Tue Aug 15 05:12:26 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 15 Aug 2023 12:12:26 +0200 Subject: [petsc-users] eigenvalue problem involving inverse of a matrix In-Reply-To: References: <61755600-912A-4819-B7D0-C77ABE57A15C@dsic.upv.es> Message-ID: <9FEEE7D0-3BD4-406D-91F7-500380BF98A4@dsic.upv.es> The computed residuals are not good. Something went wrong during the eigensolve, probably within your shell matrix multiply. You should check all operations, specially within your shell matrix ops, wrapping them with PetscCall(). Otherwise, failures go unnoticed. Regarding shift-and-invert, you cannot use PCLU with a shell matrix. You should use an iterative KSP/PC, but this will likely be very inefficient. An alternative would be to implement the shift-and-invert transformation yourself as a shell matrix. Probably you are doing things too complicated. Instead I would try solving the generalized eigenproblem Q*y=lambda*C*y with C=B^H*B and y=B^-1*x. You can build C explicitly with PETSc Mat-Mat product operations, see https://petsc.org/release/manualpages/Mat/MatMatMult/ - then you will be able to use STSINVERT. Jose > El 15 ago 2023, a las 11:21, maitri ksh escribi?: > > I used 'Matshell' and associated it with a custom-defined matrix-vector multiplication and used it to solve the eigenvalue problem ((((LU)^-H)*Q*(LU)^-1)*x = lmbda*x, where LU is the factor of a matrix B). I compared the eigenvalue results with matlab, however in matlab, I computed the matrix A=(B^-H)*Q*B^-1 directly and used eig(A). Here are the results: > > petsc('eigVal.png'): (method: krylovschur) > lmbd1 = 22.937184 > lmbd2 = -6.306099 > lmbd3 = 2.904980 > lmbd4 = 0.026435 > > Matlab: > lmbd1 = 0.0021 > lmbd2 = 0.0840 > lmbd3 = 3.9060 > lmbd4 = 22.7579 > > It appears that the iterative procedure that I have adopted (in petsc) is accurate only for the largest eigenvalue. Is this correct? or is it due to some error in my code? > > Also, I tried using shift-invert-strategy ('code_snippet_sinvert.png') to see if I can get accurate non-largest eigenvalue, but it throws error ('error.png') related to 'MatSolverType mumps does not support matrix type shell', and it gives the same error message with petsc's native MATSOLVERSUPERLU. How to resolve this? > > > > > > > > On Mon, Aug 14, 2023 at 1:20?PM maitri ksh wrote: > got it, thanks Pierre & Jose. > > On Mon, Aug 14, 2023 at 12:50?PM Jose E. Roman wrote: > See for instance ex3.c and ex9.c > https://slepc.upv.es/documentation/current/src/eps/tutorials/index.html > > Jose > > > > El 14 ago 2023, a las 10:45, Pierre Jolivet escribi?: > > > > > > > >> On 14 Aug 2023, at 10:39 AM, maitri ksh wrote: > >> > >> ? > >> Hi, > >> I need to solve an eigenvalue problem Ax=lmbda*x, where A=(B^-H)*Q*B^-1 is a hermitian matrix, 'B^-H' refers to the hermitian of the inverse of the matrix B. Theoretically it would take around 1.8TB to explicitly compute the matrix B^-1 . A feasible way to solve this eigenvalue problem would be to use the LU factors of the B matrix instead. So the problem looks something like this: > >> (((LU)^-H)*Q*(LU)^-1)*x = lmbda*x > >> For a guess value of the (normalised) eigen-vector 'x', > >> 1) one would require to solve two linear equations to get 'Ax', > >> (LU)*y=x, solve for 'y', > >> ((LU)^H)*z=Q*y, solve for 'z' > >> then one can follow the conventional power-iteration procedure > >> 2) update eigenvector: x= z/||z|| > >> 3) get eigenvalue using the Rayleigh quotient > >> 4) go to step-1 and loop through with a conditional break. > >> > >> Is there any example in petsc that does not require explicit declaration of the matrix 'A' (Ax=lmbda*x) and instead takes a vector 'Ax' as input for an iterative algorithm (like the one above). I looked into some of the examples of eigenvalue problems ( it's highly possible that I might have overlooked, I am new to petsc) but I couldn't find a way to circumvent the explicit declaration of matrix A. > > > > You could use SLEPc with a MatShell, that?s the very purpose of this MatType. > > > > Thanks, > > Pierre > > > >> Maitri > > From ngoetting at itp.uni-bremen.de Tue Aug 15 05:48:17 2023 From: ngoetting at itp.uni-bremen.de (=?UTF-8?Q?Niclas_G=C3=B6tting?=) Date: Tue, 15 Aug 2023 12:48:17 +0200 Subject: [petsc-users] Python PETSc performance vs scipy ZVODE In-Reply-To: References: <56d04446-ca71-4589-a028-4a174488e30d@itp.uni-bremen.de> Message-ID: On the basis of your suggestion, I tried using vode for the real-valued problem with scipy and I get roughly the same speed as before with scipy, which could have three reasons 1. (Z)VODE is slower than plain RK (however, I must admit that I'm not quite sure what (Z)VODE does precisely) 2. Sparse matrix operations in scipy are slow. Some of them are even written in pure python. 3. The RHS function in scipy must *return* a vector and therefore allocates new memory for each iteration. Parallelizing the code is of course a goal of mine, but I believe this will only become relevant for larger systems, which I want to investigate in the near future. Regarding the RHS Jacobian, I see why defining RHSFunction vs RHSJacobian should be computationally equivalent, but I found it much easier to optimize the RHSFunction in this case, and I'm not quite sure as to why the documentation is so specific in strictly recommending the pattern of only providing a Jacobian and not a RHS function, while that should be equivalent. Lastly, I'm aware that another performance boost awaits upon turning off the debugging functionality, but for this simple test I just wanted to see, if there is *any* improvement in performance and I was very much surprised over the factor of 7 with debugging turned on already. Thank you all for the interesting input and have a nice day! Niclas On 15.08.23 00:37, Zhang, Hong wrote: > PETSs is not necessarily faster than scipy for your problem when > executed in serial. But you get benefits when running in parallel. The > PETSc code you wrote uses float64 while your scipy code uses > complex128, so the comparison may not be fair. > > In addition, using the RHS Jacobian does not necessarily make your > PETSc code slower. In your case, the bottleneck is the matrix > operations. For best performance, you should avoid adding two sparse > matrices (especially with different sparsity patterns) which is very > costly. So one MatMult + one MultAdd is the best option. MatAXPY with > the same nonzero pattern would be a bit slower but still faster than > MatAXPY with subset nonzero pattern, which you used in the Jacobian > function. > > I echo Barry?s suggestion that debugging should be turned off before > you do any performance study. > > Hong (Mr.) > >> On Aug 10, 2023, at 4:40 AM, Niclas G?tting >> wrote: >> >> Thank you both for the very quick answer! >> >> So far, I compiled PETSc with debugging turned on, but I think it >> should still be faster than standard scipy in both cases. Actually, >> Stefano's answer has got me very far already; now I only define the >> RHS of the ODE and no Jacobian (I wonder, why the documentation >> suggests otherwise, though). I had the following four tries at >> implementing the RHS: >> >> 1. def rhsfunc1(ts, t, u, F): >> ??? scale = 0.5 * (5 < t < 10) >> ??? (l + scale * pump).mult(u, F) >> 2. def rhsfunc2(ts, t, u, F): >> ??? l.mult(u, F) >> ??? scale = 0.5 * (5 < t < 10) >> ??? (scale * pump).multAdd(u, F, F) >> 3. def rhsfunc3(ts, t, u, F): >> ??? l.mult(u, F) >> ??? scale = 0.5 * (5 < t < 10) >> ??? if scale != 0: >> ??????? pump.scale(scale) >> ??????? pump.multAdd(u, F, F) >> ??????? pump.scale(1/scale) >> 4. def rhsfunc4(ts, t, u, F): >> ??? tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() >> ??? l.mult(u, F) >> ??? scale = 0.5 * (5 < t < 10) >> ??? tmp_pump.axpy(scale, pump, >> structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) >> ??? tmp_pump.multAdd(u, F, F) >> >> They all yield the same results, but with 50it/s, 800it/, 2300it/s >> and 1900it/s, respectively, which is a huge performance boost (almost >> 7 times as fast as scipy, with PETSc debugging still turned on). As >> the scale function will most likely be a gaussian in the future, I >> think that option 3 will be become numerically unstable and I'll have >> to go with option 4, which is already faster than I expected. If you >> think it is possible to speed up the RHS calculation even more, I'd >> be happy to hear your suggestions; the -log_view is attached to this >> message. >> >> One last point: If I didn't misunderstand the documentation at >> https://petsc.org/release/manual/ts/#special-cases, should this maybe >> be changed? >> >> Best regards >> Niclas >> >> On 09.08.23 17:51, Stefano Zampini wrote: >>> TSRK is an explicit solver. Unless you are changing the ts type from >>> command line,? the explicit? jacobian should not be needed. On top >>> of Barry's suggestion, I would suggest you to write the explicit RHS >>> instead of assembly a throw away matrix every time that function >>> needs to be sampled. >>> >>> On Wed, Aug 9, 2023, 17:09 Niclas G?tting >>> wrote: >>> >>> Hi all, >>> >>> I'm currently trying to convert a quantum simulation from scipy to >>> PETSc. The problem itself is extremely simple and of the form >>> \dot{u}(t) >>> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test >>> case is >>> a square function. The matrices A_const and B_const are >>> extremely sparse >>> and therefore I thought, the problem will be well suited for PETSc. >>> Currently, I solve the ODE with the following procedure in scipy >>> (I can >>> provide the necessary data files, if needed, but they are just some >>> trace-preserving, very sparse matrices): >>> >>> import numpy as np >>> import scipy.sparse >>> import scipy.integrate >>> >>> from tqdm import tqdm >>> >>> >>> l = np.load("../liouvillian.npy") >>> pump = np.load("../pump_operator.npy") >>> state = np.load("../initial_state.npy") >>> >>> l = scipy.sparse.csr_array(l) >>> pump = scipy.sparse.csr_array(pump) >>> >>> def f(t, y, *args): >>> ???? return (l + 0.5 * (5 < t < 10) * pump) @ y >>> ???? #return l @ y # Uncomment for f(t) = 0 >>> >>> dt = 0.1 >>> NUM_STEPS = 200 >>> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) >>> solver = >>> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) >>> times = [] >>> for i in tqdm(range(NUM_STEPS)): >>> ???? res[i, :] = solver.integrate(solver.t + dt) >>> ???? times.append(solver.t) >>> >>> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm >>> reports >>> about 330it/s on my machine. When converting the code to PETSc, >>> I came >>> to the following result (according to the chapter >>> https://petsc.org/main/manual/ts/#special-cases) >>> >>> import sys >>> import petsc4py >>> petsc4py.init(args=sys.argv) >>> import numpy as np >>> import scipy.sparse >>> >>> from tqdm import tqdm >>> from petsc4py import PETSc >>> >>> comm = PETSc.COMM_WORLD >>> >>> >>> def mat_to_real(arr): >>> ???? return np.block([[arr.real, -arr.imag], [arr.imag, >>> arr.real]]).astype(np.float64) >>> >>> def mat_to_petsc_aij(arr): >>> ???? arr_sc_sp = scipy.sparse.csr_array(arr) >>> ???? mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) >>> ???? rstart, rend = mat.getOwnershipRange() >>> ???? print(rstart, rend) >>> ???? print(arr.shape[0]) >>> ???? print(mat.sizes) >>> ???? I = arr_sc_sp.indptr[rstart : rend + 1] - >>> arr_sc_sp.indptr[rstart] >>> ???? J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : >>> arr_sc_sp.indptr[rend]] >>> ???? V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : >>> arr_sc_sp.indptr[rend]] >>> >>> ???? print(I.shape, J.shape, V.shape) >>> ???? mat.setValuesCSR(I, J, V) >>> ???? mat.assemble() >>> ???? return mat >>> >>> >>> l = np.load("../liouvillian.npy") >>> l = mat_to_real(l) >>> pump = np.load("../pump_operator.npy") >>> pump = mat_to_real(pump) >>> state = np.load("../initial_state.npy") >>> state = np.hstack([state.real, state.imag]).astype(np.float64) >>> >>> l = mat_to_petsc_aij(l) >>> pump = mat_to_petsc_aij(pump) >>> >>> >>> jac = l.duplicate() >>> for i in range(8192): >>> ???? jac.setValue(i, i, 0) >>> jac.assemble() >>> jac += l >>> >>> vec = l.createVecRight() >>> vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) >>> vec.assemble() >>> >>> >>> dt = 0.1 >>> >>> ts = PETSc.TS().create(comm=comm) >>> ts.setFromOptions() >>> ts.setProblemType(ts.ProblemType.LINEAR) >>> ts.setEquationType(ts.EquationType.ODE_EXPLICIT) >>> ts.setType(ts.Type.RK) >>> ts.setRKType(ts.RKType.RK3BS) >>> ts.setTime(0) >>> print("KSP:", ts.getKSP().getType()) >>> print("KSP PC:",ts.getKSP().getPC().getType()) >>> print("SNES :", ts.getSNES().getType()) >>> >>> def jacobian(ts, t, u, Amat, Pmat): >>> ???? Amat.zeroEntries() >>> ???? Amat.aypx(1, l, >>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>> ???? Amat.axpy(0.5 * (5 < t < 10), pump, >>> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) >>> >>> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) >>> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # >>> Uncomment for f(t) = 0 >>> ts.setRHSJacobian(jacobian, jac) >>> >>> NUM_STEPS = 200 >>> res = np.empty((NUM_STEPS, 8192), dtype=np.float64) >>> times = [] >>> rstart, rend = vec.getOwnershipRange() >>> for i in tqdm(range(NUM_STEPS)): >>> ???? time = ts.getTime() >>> ???? ts.setMaxTime(time + dt) >>> ???? ts.solve(vec) >>> ???? res[i, rstart:rend] = vec.getArray()[:] >>> ???? times.append(time) >>> >>> I decomposed the complex ODE into a larger real ODE, so that I can >>> easily switch maybe to GPU computation later on. Now, the >>> solutions of >>> both scripts are very much identical, but PETSc runs about 3 times >>> slower at 120it/s on my machine. I don't use MPI for PETSc yet. >>> >>> I strongly suppose that the problem lies within the jacobian >>> definition, >>> as PETSc is about 3 times *faster* than scipy with f(t) = 0 and >>> therefore a constant jacobian. >>> >>> Thank you in advance. >>> >>> All the best, >>> Niclas >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Tue Aug 15 08:54:57 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Tue, 15 Aug 2023 13:54:57 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi Junchao, thank you for your observations and taking the time to look at this. So if I don't configure PETSc with the --with-cuda flag and still select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I thought I needed that flag to get the solvers to run on the V100 card. I'll remove the hardwired paths on the link flags, thanks for that! Marcos ________________________________ From: Junchao Zhang Sent: Monday, August 14, 2023 7:01 PM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov ; Satish Balay ; McDermott, Randall J. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, These are my findings. I successfully ran the test in the end. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view Starting FDS ... ... [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to enable PETSc device support, for example, in some cases, -vec_type cuda Now I get why you met errors with "CPU runs". You configured and built hypre with petsc. Since you added --with-cuda, petsc would configure hypre with its GPU support. However, hypre has a limit/shortcoming that if it is configured with GPU support, you must pass GPU vectors to it. Thus the error. In other words, if you remove --with-cuda, you should be able to run above command. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda Starting FDS ... MPI Process 0 started on hong-gce-workstation MPI Process 1 started on hong-gce-workstation Reading FDS input file ... At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 .. Fire Dynamics Simulator ... STOP: FDS completed successfully (CHID: test) I guess there were link problems in your makefile. Actually, in the first try, I failed with mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC -I"/home/jczhang/petsc/include/" -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o init.o dump.o read.o divg.o main.o -Wl,-rpath -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ -L/usr/lib64 -lX11 /usr/bin/ld: cannot find -lflapack: No such file or directory /usr/bin/ld: cannot find -lfblas: No such file or directory collect2: error: ld returned 1 exit status make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 That is because you hardwired many link flags in your fds/Build/makefile. Then I changed LFLAGS_PETSC to LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc and everything worked. Could you also try it? --Junchao Zhang On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) > wrote: Attached is the test.fds test case. Thanks! ________________________________ From: Vanella, Marcos (Fed) > Sent: Monday, August 14, 2023 5:45 PM To: Junchao Zhang >; petsc-users at mcs.anl.gov >; Satish Balay > Cc: McDermott, Randall J. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU All right Junchao, thank you for looking at this! So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc env variables: # PETSc dir and arch, set MYSYS to nisaba dor FDS: export PETSC_DIR=/dir_to_petsc/petsc export PETSC_ARCH=arch-linux-c-dbg export MYSYSTEM=nisaba and configured the library with: $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda Then made and checked the PETSc build. Then for FDS: 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX branch: $ cd ~/fds_dir $ git clone https://github.com/marcosvanella/fds.git $ cd fds $ git checkout FireX 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a debug target for fds (this is with cuda enabled openmpi compiled with gcc, in my case gcc-11.2 + PETSc): $ cd Build/ompi_gnu_linux_db $./make_fds.sh You should see compilation lines like this, with the WITH_PETSC Preprocessor variable being defined: Building ompi_gnu_linux_db mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 ... ... If you are compiling on a Power9 node you might come across this error right off the bat: ../../Source/prec.f90:34:8: 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A very small number 16 byte accuracy | 1 Error: Kind -3 not supported for type REAL at (1) which means for some reason gcc in the Power9 does not like quad precision definition in this manner. A way around it is to add the intrinsic Fortran2008 module iso_fortran_env: use, intrinsic :: iso_fortran_env in the fds/Source/prec.f90 file and change the quad precision denominator to: INTEGER, PARAMETER :: QB = REAL128 in there. We are investigating the reason why this is happening. This is not related to Petsc in the code, everything related to PETSc calls is integers and double precision reals. After the code compiles you get the executable in ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db With which you can run the attached 2 mesh case as: $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE which is what I was testing in CPU. This is the result I get from the previous submission in an interactive job in Enki (similar with batch submissions, gmres ksp, gamg pc): Starting FDS ... MPI Process 1 started on enki11.adlp MPI Process 0 started on enki11.adlp Reading FDS input file ... WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: DEVC Device is not within any mesh. Fire Dynamics Simulator Current Date : August 14, 2023 17:26:22 Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX Revision Date : Mon Aug 14 17:07:20 2023 -0400 Compiler : Gnu gfortran 11.2.1 Compilation Date : Aug 14, 2023 17:11:05 MPI Enabled; Number of MPI Processes: 2 OpenMP Enabled; Number of OpenMP Threads: 1 MPI version: 3.1 MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 Job TITLE : Job ID string : test terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Seems the issue stems from the call to KSPSOLVE, line 3130 in fds/Source/pres.f90. Well, thank you for taking the time to look at this and also let me know if these threads should be moved to the issue tracker, or other venue. Best, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 4:37 PM To: Vanella, Marcos (Fed) >; PETSc users list > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU I don't see a problem in the matrix assembly. If you point me to your repo and show me how to build it, I can try to reproduce. --Junchao Zhang On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as (I understand) is done in the ex60. The error is always the same, so it seems it is not related to ksp,pc. Indeed it seems to happen when trying to offload the Matrix to the GPU: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? ... #8 0x20003935fc6b in ??? #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11edb287 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11edb287 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edfd1b in MatSeqAIJCUSPARSEGetIJ ... ... This is the piece of fortran code I have doing this within my Poisson solver: ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag blocks nonzeros per row to 5. CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL%NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS%A_H,PETSC_IERR) CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) DO IROW=1,ZSL%NUNKH_LOCAL DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) ! PETSC expects zero based indexes.1,Global I position (zero base),1,Global J position (zero base) CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1,ZSL%JD_MAT_H(JCOL,IROW)-1,& ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) ENDDO ENDDO CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), and add nonzero values one by one. I wonder if there is something related to this that the copying to GPU does not like. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 3:24 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Yeah, it looks like ex60 was run correctly. Double check your code again and if you still run into errors, we can try to reproduce on our end. Thanks. --Junchao Zhang On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The batch script for slurm submission, ex60.log and gpu stats files are attached. Nothing stands out as wrong to me but please have a look. I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. Thanks! Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 5:52 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok. The comment at the end shows how to run it test: requires: cuda suffix: 1_cuda nsize: 4 args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse --Junchao Zhang On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3. I tried the case again, 2 CPUs and one GPU and get this error now: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 srun: error: enki12: tasks 0-1: Aborted (core dumped) This was the slurm submission script in this case: #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=debug #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 # PETSc dir and arch: export PETSC_DIR=/home/mnv/Software/petsc export PETSC_ARCH=arch-linux-c-dbg # SYSTEM name: export MYSYSTEM=enki # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg The configure.log for the PETSc build is attached. Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : 0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: GPU error [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: GPU error [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda ... I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver. Let me know if you have any ideas as to what is happening. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:35 PM To: Vanella, Marcos (Fed) >; PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it. I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error: .... CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) | 1 Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit interface /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: 38 | import tIS | 1 Error: IMPORT statement at (1) only permitted in an INTERFACE body /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: 39 | PetscInt a ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: 40 | PetscInt b ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: 41 | PetscInt c ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: 42 | PetscInt d ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: 43 | PetscInt e ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: 44 | PetscInt f ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: 45 | PetscInt g ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: 46 | IS h ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: 47 | IS i ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: 48 | PetscErrorCode z | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: 49 | end subroutine PCASMCreateSubdomains2D | 1 Error: Expecting END INTERFACE statement at (1) make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o make[3]: Leaving directory '/home/mnv/Software/petsc' make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 make[2]: Leaving directory '/home/mnv/Software/petsc' **************************ERROR************************************* Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** make[1]: *** [makefile:45: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:04 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Aug 15 08:59:20 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 15 Aug 2023 08:59:20 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: On Tue, Aug 15, 2023 at 8:55?AM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, thank you for your observations and taking the time to look at > this. So if I don't configure PETSc with the --with-cuda flag and still > select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I > thought I needed that flag to get the solvers to run on the V100 card. > No, to have hypre run on CPU, you need to configure petsc/hypre without --with-cuda; otherwise, you need --with-cuda and have to always use flags like -vec_type cuda etc. I admit this is not user-friendly and should be fixed by petsc and hypre developers. > > > I'll remove the hardwired paths on the link flags, thanks for that! > > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 7:01 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > These are my findings. I successfully ran the test in the end. > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view > Starting FDS ... > ... > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to > enable PETSc device support, for example, in some cases, -vec_type cuda > > Now I get why you met errors with "CPU runs". You configured and built > hypre with petsc. Since you added --with-cuda, petsc would configure hypre > with its GPU support. However, hypre has a limit/shortcoming that if it is > configured with GPU support, you must pass GPU vectors to it. Thus the > error. In other words, if you remove --with-cuda, you should be able to run > above command. > > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type > aijcusparse -vec_type cuda > > Starting FDS ... > > MPI Process 0 started on hong-gce-workstation > MPI Process 1 started on hong-gce-workstation > > Reading FDS input file ... > > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > .. > > Fire Dynamics Simulator > > ... > STOP: FDS completed successfully (CHID: test) > > I guess there were link problems in your makefile. Actually, in the first > try, I failed with > > mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 > 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" > -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC > -I"/home/jczhang/petsc/include/" > -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o > fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o > func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o > part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o > init.o dump.o read.o divg.o main.o -Wl,-rpath > -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags > -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi > -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib > -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack > -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 > -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ > -L/usr/lib64 -lX11 > /usr/bin/ld: cannot find -lflapack: No such file or directory > /usr/bin/ld: cannot find -lfblas: No such file or directory > collect2: error: ld returned 1 exit status > make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 > > That is because you hardwired many link flags in your fds/Build/makefile. > Then I changed LFLAGS_PETSC to > LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib > -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc > > and everything worked. Could you also try it? > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Attached is the test.fds test case. Thanks! > ------------------------------ > *From:* Vanella, Marcos (Fed) > *Sent:* Monday, August 14, 2023 5:45 PM > *To:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Satish Balay > *Cc:* McDermott, Randall J. (Fed) > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > All right Junchao, thank you for looking at this! > > So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc > env variables: > > # PETSc dir and arch, set MYSYS to nisaba dor FDS: > export PETSC_DIR=/dir_to_petsc/petsc > export PETSC_ARCH=arch-linux-c-dbg > export MYSYSTEM=nisaba > > and configured the library with: > > $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" > FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes > --with-shared-libraries=0 --download-suitesparse --download-hypre > --download-fblaslapack --with-cuda > > Then made and checked the PETSc build. > > Then for FDS: > > 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX > branch: > > $ cd ~/fds_dir > $ git clone https://github.com/marcosvanella/fds.git > $ cd fds > $ git checkout FireX > > > 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a > debug target for fds (this is with cuda enabled openmpi compiled with gcc, > in my case gcc-11.2 + PETSc): > > $ cd Build/ompi_gnu_linux_db > $./make_fds.sh > > You should see compilation lines like this, with the WITH_PETSC > Preprocessor variable being defined: > > Building ompi_gnu_linux_db > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 > ... > ... > > If you are compiling on a Power9 node you might come across this error > right off the bat: > > ../../Source/prec.f90:34:8: > > 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A > very small number 16 byte accuracy > | 1 > Error: Kind -3 not supported for type REAL at (1) > > which means for some reason gcc in the Power9 does not like quad precision > definition in this manner. A way around it is to add the intrinsic > Fortran2008 module iso_fortran_env: > > use, intrinsic :: iso_fortran_env > > in the fds/Source/prec.f90 file and change the quad precision denominator > to: > > INTEGER, PARAMETER :: QB = REAL128 > > in there. We are investigating the reason why this is happening. This is > not related to Petsc in the code, everything related to PETSc calls is > integers and double precision reals. > > After the code compiles you get the executable in > ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > > With which you can run the attached 2 mesh case as: > > $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view > > and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE > which is what I was testing in CPU. This is the result I get from the > previous submission in an interactive job in Enki (similar with batch > submissions, gmres ksp, gamg pc): > > > Starting FDS ... > > MPI Process 1 started on enki11.adlp > MPI Process 0 started on enki11.adlp > > Reading FDS input file ... > > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: DEVC Device is not within any mesh. > > Fire Dynamics Simulator > > Current Date : August 14, 2023 17:26:22 > Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX > Revision Date : Mon Aug 14 17:07:20 2023 -0400 > Compiler : Gnu gfortran 11.2.1 > Compilation Date : Aug 14, 2023 17:11:05 > > MPI Enabled; Number of MPI Processes: 2 > OpenMP Enabled; Number of OpenMP Threads: 1 > > MPI version: 3.1 > MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp > Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 > > Job TITLE : > Job ID string : test > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > *#19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #*19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > Seems the issue stems from the call to KSPSOLVE, line 3130 in > fds/Source/pres.f90. > > Well, thank you for taking the time to look at this and also let me know > if these threads should be moved to the issue tracker, or other venue. > Best, > Marcos > > > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 4:37 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > I don't see a problem in the matrix assembly. > If you point me to your repo and show me how to build it, I can try to > reproduce. > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type > asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as > (I understand) is done in the ex60. The error is always the same, so it > seems it is not related to ksp,pc. Indeed it seems to happen when trying to > offload the Matrix to the GPU: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > ... > #8 0x20003935fc6b in ??? > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11edb287 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ* > ... > ... > > This is the piece of fortran code I have doing this within my Poisson > solver: > > ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag > blocks nonzeros per row to 5. > CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL% > NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& > 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS% > A_H,PETSC_IERR) > CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) > DO IROW=1,ZSL%NUNKH_LOCAL > DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) > ! PETSC expects zero based indexes.1,Global I position (zero > base),1,Global J position (zero base) > CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1 > ,ZSL%JD_MAT_H(JCOL,IROW)-1,& > ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) > ENDDO > ENDDO > CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > > Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), > and add nonzero values one by one. I wonder if there is something related > to this that the copying to GPU does not like. > Thanks, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 3:24 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Yeah, it looks like ex60 was run correctly. > Double check your code again and if you still run into errors, we can try > to reproduce on our end. > > Thanks. > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Tue Aug 15 09:57:31 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Tue, 15 Aug 2023 14:57:31 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: I see. I'm trying to get hypre (or any preconditioner) to run on the GPU and this is what is giving me issues. I can run cases with the CPU only version of PETSc without problems. I tried running the job both in an interactive session and through slurm with the --with-cuda configured PETSc and passing the cuda vector flag at runtime like you did: $ mpirun -n 2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda and still get the error. So provided we both configured PETSc in the same way I'm thinking there is something going on with the configuration of my cluster. Even without defining the "-mat_type aijcusparse -vec_type cuda" flags in the submission line I get the same "parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument" error instead of what you see ("You need to enable PETSc device support"). I noted you use a Kokkos version of PETSc, is this related to your development? Thank you, Marcos ________________________________ From: Junchao Zhang Sent: Tuesday, August 15, 2023 9:59 AM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov ; Satish Balay ; McDermott, Randall J. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU On Tue, Aug 15, 2023 at 8:55?AM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for your observations and taking the time to look at this. So if I don't configure PETSc with the --with-cuda flag and still select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I thought I needed that flag to get the solvers to run on the V100 card. No, to have hypre run on CPU, you need to configure petsc/hypre without --with-cuda; otherwise, you need --with-cuda and have to always use flags like -vec_type cuda etc. I admit this is not user-friendly and should be fixed by petsc and hypre developers. I'll remove the hardwired paths on the link flags, thanks for that! Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 7:01 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov >; Satish Balay >; McDermott, Randall J. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, These are my findings. I successfully ran the test in the end. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view Starting FDS ... ... [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to enable PETSc device support, for example, in some cases, -vec_type cuda Now I get why you met errors with "CPU runs". You configured and built hypre with petsc. Since you added --with-cuda, petsc would configure hypre with its GPU support. However, hypre has a limit/shortcoming that if it is configured with GPU support, you must pass GPU vectors to it. Thus the error. In other words, if you remove --with-cuda, you should be able to run above command. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda Starting FDS ... MPI Process 0 started on hong-gce-workstation MPI Process 1 started on hong-gce-workstation Reading FDS input file ... At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 .. Fire Dynamics Simulator ... STOP: FDS completed successfully (CHID: test) I guess there were link problems in your makefile. Actually, in the first try, I failed with mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC -I"/home/jczhang/petsc/include/" -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o init.o dump.o read.o divg.o main.o -Wl,-rpath -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ -L/usr/lib64 -lX11 /usr/bin/ld: cannot find -lflapack: No such file or directory /usr/bin/ld: cannot find -lfblas: No such file or directory collect2: error: ld returned 1 exit status make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 That is because you hardwired many link flags in your fds/Build/makefile. Then I changed LFLAGS_PETSC to LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc and everything worked. Could you also try it? --Junchao Zhang On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) > wrote: Attached is the test.fds test case. Thanks! ________________________________ From: Vanella, Marcos (Fed) > Sent: Monday, August 14, 2023 5:45 PM To: Junchao Zhang >; petsc-users at mcs.anl.gov >; Satish Balay > Cc: McDermott, Randall J. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU All right Junchao, thank you for looking at this! So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc env variables: # PETSc dir and arch, set MYSYS to nisaba dor FDS: export PETSC_DIR=/dir_to_petsc/petsc export PETSC_ARCH=arch-linux-c-dbg export MYSYSTEM=nisaba and configured the library with: $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda Then made and checked the PETSc build. Then for FDS: 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX branch: $ cd ~/fds_dir $ git clone https://github.com/marcosvanella/fds.git $ cd fds $ git checkout FireX 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a debug target for fds (this is with cuda enabled openmpi compiled with gcc, in my case gcc-11.2 + PETSc): $ cd Build/ompi_gnu_linux_db $./make_fds.sh You should see compilation lines like this, with the WITH_PETSC Preprocessor variable being defined: Building ompi_gnu_linux_db mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 ... ... If you are compiling on a Power9 node you might come across this error right off the bat: ../../Source/prec.f90:34:8: 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A very small number 16 byte accuracy | 1 Error: Kind -3 not supported for type REAL at (1) which means for some reason gcc in the Power9 does not like quad precision definition in this manner. A way around it is to add the intrinsic Fortran2008 module iso_fortran_env: use, intrinsic :: iso_fortran_env in the fds/Source/prec.f90 file and change the quad precision denominator to: INTEGER, PARAMETER :: QB = REAL128 in there. We are investigating the reason why this is happening. This is not related to Petsc in the code, everything related to PETSc calls is integers and double precision reals. After the code compiles you get the executable in ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db With which you can run the attached 2 mesh case as: $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE which is what I was testing in CPU. This is the result I get from the previous submission in an interactive job in Enki (similar with batch submissions, gmres ksp, gamg pc): Starting FDS ... MPI Process 1 started on enki11.adlp MPI Process 0 started on enki11.adlp Reading FDS input file ... WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: DEVC Device is not within any mesh. Fire Dynamics Simulator Current Date : August 14, 2023 17:26:22 Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX Revision Date : Mon Aug 14 17:07:20 2023 -0400 Compiler : Gnu gfortran 11.2.1 Compilation Date : Aug 14, 2023 17:11:05 MPI Enabled; Number of MPI Processes: 2 OpenMP Enabled; Number of OpenMP Threads: 1 MPI version: 3.1 MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 Job TITLE : Job ID string : test terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Seems the issue stems from the call to KSPSOLVE, line 3130 in fds/Source/pres.f90. Well, thank you for taking the time to look at this and also let me know if these threads should be moved to the issue tracker, or other venue. Best, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 4:37 PM To: Vanella, Marcos (Fed) >; PETSc users list > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU I don't see a problem in the matrix assembly. If you point me to your repo and show me how to build it, I can try to reproduce. --Junchao Zhang On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as (I understand) is done in the ex60. The error is always the same, so it seems it is not related to ksp,pc. Indeed it seems to happen when trying to offload the Matrix to the GPU: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? ... #8 0x20003935fc6b in ??? #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11edb287 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11edb287 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edfd1b in MatSeqAIJCUSPARSEGetIJ ... ... This is the piece of fortran code I have doing this within my Poisson solver: ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag blocks nonzeros per row to 5. CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL%NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS%A_H,PETSC_IERR) CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) DO IROW=1,ZSL%NUNKH_LOCAL DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) ! PETSC expects zero based indexes.1,Global I position (zero base),1,Global J position (zero base) CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1,ZSL%JD_MAT_H(JCOL,IROW)-1,& ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) ENDDO ENDDO CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), and add nonzero values one by one. I wonder if there is something related to this that the copying to GPU does not like. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 3:24 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Yeah, it looks like ex60 was run correctly. Double check your code again and if you still run into errors, we can try to reproduce on our end. Thanks. --Junchao Zhang On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The batch script for slurm submission, ex60.log and gpu stats files are attached. Nothing stands out as wrong to me but please have a look. I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. Thanks! Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 5:52 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok. The comment at the end shows how to run it test: requires: cuda suffix: 1_cuda nsize: 4 args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse --Junchao Zhang On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3. I tried the case again, 2 CPUs and one GPU and get this error now: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 srun: error: enki12: tasks 0-1: Aborted (core dumped) This was the slurm submission script in this case: #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=debug #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 # PETSc dir and arch: export PETSC_DIR=/home/mnv/Software/petsc export PETSC_ARCH=arch-linux-c-dbg # SYSTEM name: export MYSYSTEM=enki # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg The configure.log for the PETSc build is attached. Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : 0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: GPU error [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: GPU error [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda ... I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver. Let me know if you have any ideas as to what is happening. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:35 PM To: Vanella, Marcos (Fed) >; PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it. I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error: .... CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) | 1 Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit interface /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: 38 | import tIS | 1 Error: IMPORT statement at (1) only permitted in an INTERFACE body /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: 39 | PetscInt a ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: 40 | PetscInt b ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: 41 | PetscInt c ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: 42 | PetscInt d ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: 43 | PetscInt e ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: 44 | PetscInt f ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: 45 | PetscInt g ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: 46 | IS h ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: 47 | IS i ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: 48 | PetscErrorCode z | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: 49 | end subroutine PCASMCreateSubdomains2D | 1 Error: Expecting END INTERFACE statement at (1) make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o make[3]: Leaving directory '/home/mnv/Software/petsc' make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 make[2]: Leaving directory '/home/mnv/Software/petsc' **************************ERROR************************************* Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** make[1]: *** [makefile:45: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:04 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Aug 15 10:54:46 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 15 Aug 2023 10:54:46 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: On Tue, Aug 15, 2023 at 9:57?AM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > I see. I'm trying to get hypre (or any preconditioner) to run on the GPU > and this is what is giving me issues. I can run cases with the CPU only > version of PETSc without problems. > > I tried running the job both in an interactive session and through slurm > with the --with-cuda configured PETSc and passing the cuda vector flag at > runtime like you did: > > $ mpirun -n 2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view -mat_type aijcusparse -vec_type cuda > > and still get the error. So provided we both configured PETSc in the same > way I'm thinking there is something going on with the configuration of my > cluster. > > Even without defining the "-mat_type aijcusparse -vec_type cuda" flags in > the submission line I get the same "parallel_for failed: > cudaErrorInvalidConfiguration: invalid configuration argument" error > instead of what you see ("You need to enable PETSc device support"). > > I noted you use a Kokkos version of PETSc, is this related to your > development? > No, Kokkos is irrelevant here. Were you able to compile your code with the much simpler LFLAGS_PETSC? LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc > Thank you, > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Tuesday, August 15, 2023 9:59 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > > > > On Tue, Aug 15, 2023 at 8:55?AM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for your observations and taking the time to look at > this. So if I don't configure PETSc with the --with-cuda flag and still > select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I > thought I needed that flag to get the solvers to run on the V100 card. > > No, to have hypre run on CPU, you need to configure petsc/hypre without > --with-cuda; otherwise, you need --with-cuda and have to always use flags > like -vec_type cuda etc. I admit this is not user-friendly and should be > fixed by petsc and hypre developers. > > > I'll remove the hardwired paths on the link flags, thanks for that! > > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 7:01 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > These are my findings. I successfully ran the test in the end. > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view > Starting FDS ... > ... > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to > enable PETSc device support, for example, in some cases, -vec_type cuda > > Now I get why you met errors with "CPU runs". You configured and built > hypre with petsc. Since you added --with-cuda, petsc would configure hypre > with its GPU support. However, hypre has a limit/shortcoming that if it is > configured with GPU support, you must pass GPU vectors to it. Thus the > error. In other words, if you remove --with-cuda, you should be able to run > above command. > > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type > aijcusparse -vec_type cuda > > Starting FDS ... > > MPI Process 0 started on hong-gce-workstation > MPI Process 1 started on hong-gce-workstation > > Reading FDS input file ... > > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > .. > > Fire Dynamics Simulator > > ... > STOP: FDS completed successfully (CHID: test) > > I guess there were link problems in your makefile. Actually, in the first > try, I failed with > > mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 > 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" > -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC > -I"/home/jczhang/petsc/include/" > -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o > fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o > func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o > part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o > init.o dump.o read.o divg.o main.o -Wl,-rpath > -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags > -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi > -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib > -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack > -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 > -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ > -L/usr/lib64 -lX11 > /usr/bin/ld: cannot find -lflapack: No such file or directory > /usr/bin/ld: cannot find -lfblas: No such file or directory > collect2: error: ld returned 1 exit status > make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 > > That is because you hardwired many link flags in your fds/Build/makefile. > Then I changed LFLAGS_PETSC to > LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib > -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc > > and everything worked. Could you also try it? > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Attached is the test.fds test case. Thanks! > ------------------------------ > *From:* Vanella, Marcos (Fed) > *Sent:* Monday, August 14, 2023 5:45 PM > *To:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Satish Balay > *Cc:* McDermott, Randall J. (Fed) > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > All right Junchao, thank you for looking at this! > > So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc > env variables: > > # PETSc dir and arch, set MYSYS to nisaba dor FDS: > export PETSC_DIR=/dir_to_petsc/petsc > export PETSC_ARCH=arch-linux-c-dbg > export MYSYSTEM=nisaba > > and configured the library with: > > $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" > FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes > --with-shared-libraries=0 --download-suitesparse --download-hypre > --download-fblaslapack --with-cuda > > Then made and checked the PETSc build. > > Then for FDS: > > 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX > branch: > > $ cd ~/fds_dir > $ git clone https://github.com/marcosvanella/fds.git > $ cd fds > $ git checkout FireX > > > 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a > debug target for fds (this is with cuda enabled openmpi compiled with gcc, > in my case gcc-11.2 + PETSc): > > $ cd Build/ompi_gnu_linux_db > $./make_fds.sh > > You should see compilation lines like this, with the WITH_PETSC > Preprocessor variable being defined: > > Building ompi_gnu_linux_db > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 > ... > ... > > If you are compiling on a Power9 node you might come across this error > right off the bat: > > ../../Source/prec.f90:34:8: > > 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A > very small number 16 byte accuracy > | 1 > Error: Kind -3 not supported for type REAL at (1) > > which means for some reason gcc in the Power9 does not like quad precision > definition in this manner. A way around it is to add the intrinsic > Fortran2008 module iso_fortran_env: > > use, intrinsic :: iso_fortran_env > > in the fds/Source/prec.f90 file and change the quad precision denominator > to: > > INTEGER, PARAMETER :: QB = REAL128 > > in there. We are investigating the reason why this is happening. This is > not related to Petsc in the code, everything related to PETSc calls is > integers and double precision reals. > > After the code compiles you get the executable in > ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > > With which you can run the attached 2 mesh case as: > > $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view > > and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE > which is what I was testing in CPU. This is the result I get from the > previous submission in an interactive job in Enki (similar with batch > submissions, gmres ksp, gamg pc): > > > Starting FDS ... > > MPI Process 1 started on enki11.adlp > MPI Process 0 started on enki11.adlp > > Reading FDS input file ... > > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: DEVC Device is not within any mesh. > > Fire Dynamics Simulator > > Current Date : August 14, 2023 17:26:22 > Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX > Revision Date : Mon Aug 14 17:07:20 2023 -0400 > Compiler : Gnu gfortran 11.2.1 > Compilation Date : Aug 14, 2023 17:11:05 > > MPI Enabled; Number of MPI Processes: 2 > OpenMP Enabled; Number of OpenMP Threads: 1 > > MPI version: 3.1 > MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp > Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 > > Job TITLE : > Job ID string : test > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > *#19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #*19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > Seems the issue stems from the call to KSPSOLVE, line 3130 in > fds/Source/pres.f90. > > Well, thank you for taking the time to look at this and also let me know > if these threads should be moved to the issue tracker, or other venue. > Best, > Marcos > > > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 4:37 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > I don't see a problem in the matrix assembly. > If you point me to your repo and show me how to build it, I can try to > reproduce. > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type > asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as > (I understand) is done in the ex60. The error is always the same, so it > seems it is not related to ksp,pc. Indeed it seems to happen when trying to > offload the Matrix to the GPU: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > ... > #8 0x20003935fc6b in ??? > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11edb287 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ* > ... > ... > > This is the piece of fortran code I have doing this within my Poisson > solver: > > ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag > blocks nonzeros per row to 5. > CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL% > NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& > 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS% > A_H,PETSC_IERR) > CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) > DO IROW=1,ZSL%NUNKH_LOCAL > DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) > ! PETSC expects zero based indexes.1,Global I position (zero > base),1,Global J position (zero base) > CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1 > ,ZSL%JD_MAT_H(JCOL,IROW)-1,& > ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) > ENDDO > ENDDO > CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > > Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), > and add nonzero values one by one. I wonder if there is something related > to this that the copying to GPU does not like. > Thanks, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 3:24 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Yeah, it looks like ex60 was run correctly. > Double check your code again and if you still run into errors, we can try > to reproduce on our end. > > Thanks. > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Tue Aug 15 11:35:05 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Tue, 15 Aug 2023 16:35:05 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi Junchao, you are correct. Turns out I had the cuda library path being defined in the complicated LFLAGS_PETSC pointing to cuda12 instead of 11.7, which was mixing the cuda versions, giving this weird error. The idea behind the long form of the LDFLAGS is to statically link most of the libs in the future (using also --with-shared-libraries=0 in the PETSc config), but we need to work on this. Ideally we want to end up with a self contained bundle for fds. Not clear that is even possible when adding cuda into the mix. Your suggestion for the simple version of LFLAGS_PETSC also works fine and we'll be using it in our scaling tests. Thank you for your time, Marcos ________________________________ From: Junchao Zhang Sent: Tuesday, August 15, 2023 11:54 AM To: Vanella, Marcos (Fed) Cc: petsc-users at mcs.anl.gov ; Satish Balay ; McDermott, Randall J. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU On Tue, Aug 15, 2023 at 9:57?AM Vanella, Marcos (Fed) > wrote: I see. I'm trying to get hypre (or any preconditioner) to run on the GPU and this is what is giving me issues. I can run cases with the CPU only version of PETSc without problems. I tried running the job both in an interactive session and through slurm with the --with-cuda configured PETSc and passing the cuda vector flag at runtime like you did: $ mpirun -n 2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda and still get the error. So provided we both configured PETSc in the same way I'm thinking there is something going on with the configuration of my cluster. Even without defining the "-mat_type aijcusparse -vec_type cuda" flags in the submission line I get the same "parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument" error instead of what you see ("You need to enable PETSc device support"). I noted you use a Kokkos version of PETSc, is this related to your development? No, Kokkos is irrelevant here. Were you able to compile your code with the much simpler LFLAGS_PETSC? LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc Thank you, Marcos ________________________________ From: Junchao Zhang > Sent: Tuesday, August 15, 2023 9:59 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov >; Satish Balay >; McDermott, Randall J. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU On Tue, Aug 15, 2023 at 8:55?AM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for your observations and taking the time to look at this. So if I don't configure PETSc with the --with-cuda flag and still select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I thought I needed that flag to get the solvers to run on the V100 card. No, to have hypre run on CPU, you need to configure petsc/hypre without --with-cuda; otherwise, you need --with-cuda and have to always use flags like -vec_type cuda etc. I admit this is not user-friendly and should be fixed by petsc and hypre developers. I'll remove the hardwired paths on the link flags, thanks for that! Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 7:01 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov >; Satish Balay >; McDermott, Randall J. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, These are my findings. I successfully ran the test in the end. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view Starting FDS ... ... [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to enable PETSc device support, for example, in some cases, -vec_type cuda Now I get why you met errors with "CPU runs". You configured and built hypre with petsc. Since you added --with-cuda, petsc would configure hypre with its GPU support. However, hypre has a limit/shortcoming that if it is configured with GPU support, you must pass GPU vectors to it. Thus the error. In other words, if you remove --with-cuda, you should be able to run above command. $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type aijcusparse -vec_type cuda Starting FDS ... MPI Process 0 started on hong-gce-workstation MPI Process 1 started on hong-gce-workstation Reading FDS input file ... At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 .. Fire Dynamics Simulator ... STOP: FDS completed successfully (CHID: test) I guess there were link problems in your makefile. Actually, in the first try, I failed with mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC -I"/home/jczhang/petsc/include/" -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o init.o dump.o read.o divg.o main.o -Wl,-rpath -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ -L/usr/lib64 -lX11 /usr/bin/ld: cannot find -lflapack: No such file or directory /usr/bin/ld: cannot find -lfblas: No such file or directory collect2: error: ld returned 1 exit status make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 That is because you hardwired many link flags in your fds/Build/makefile. Then I changed LFLAGS_PETSC to LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc and everything worked. Could you also try it? --Junchao Zhang On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) > wrote: Attached is the test.fds test case. Thanks! ________________________________ From: Vanella, Marcos (Fed) > Sent: Monday, August 14, 2023 5:45 PM To: Junchao Zhang >; petsc-users at mcs.anl.gov >; Satish Balay > Cc: McDermott, Randall J. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU All right Junchao, thank you for looking at this! So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc env variables: # PETSc dir and arch, set MYSYS to nisaba dor FDS: export PETSC_DIR=/dir_to_petsc/petsc export PETSC_ARCH=arch-linux-c-dbg export MYSYSTEM=nisaba and configured the library with: $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda Then made and checked the PETSc build. Then for FDS: 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX branch: $ cd ~/fds_dir $ git clone https://github.com/marcosvanella/fds.git $ cd fds $ git checkout FireX 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a debug target for fds (this is with cuda enabled openmpi compiled with gcc, in my case gcc-11.2 + PETSc): $ cd Build/ompi_gnu_linux_db $./make_fds.sh You should see compilation lines like this, with the WITH_PETSC Preprocessor variable being defined: Building ompi_gnu_linux_db mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none -fall-intrinsics -fbounds-check -cpp -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" -DWITH_PETSC -I"/home/mnv/Software/petsc/include/" -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 ... ... If you are compiling on a Power9 node you might come across this error right off the bat: ../../Source/prec.f90:34:8: 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A very small number 16 byte accuracy | 1 Error: Kind -3 not supported for type REAL at (1) which means for some reason gcc in the Power9 does not like quad precision definition in this manner. A way around it is to add the intrinsic Fortran2008 module iso_fortran_env: use, intrinsic :: iso_fortran_env in the fds/Source/prec.f90 file and change the quad precision denominator to: INTEGER, PARAMETER :: QB = REAL128 in there. We are investigating the reason why this is happening. This is not related to Petsc in the code, everything related to PETSc calls is integers and double precision reals. After the code compiles you get the executable in ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db With which you can run the attached 2 mesh case as: $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -log_view and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE which is what I was testing in CPU. This is the result I get from the previous submission in an interactive job in Enki (similar with batch submissions, gmres ksp, gamg pc): Starting FDS ... MPI Process 1 started on enki11.adlp MPI Process 0 started on enki11.adlp Reading FDS input file ... WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any unassigned SPEC variables in the input were assigned the properties of nitrogen. At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3014 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created At line 3461 of file ../../Source/read.f90 Fortran runtime warning: An array temporary was created WARNING: DEVC Device is not within any mesh. Fire Dynamics Simulator Current Date : August 14, 2023 17:26:22 Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX Revision Date : Mon Aug 14 17:07:20 2023 -0400 Compiler : Gnu gfortran 11.2.1 Compilation Date : Aug 14, 2023 17:11:05 MPI Enabled; Number of MPI Processes: 2 OpenMP Enabled; Number of OpenMP Threads: 1 MPI version: 3.1 MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 Job TITLE : Job ID string : test terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec67db in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efc7e3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #12 0x11efc7e3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #13 0x11efc7e3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efc7e3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efc7e3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #18 0x11eda3c7 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4300 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #21 0x11e91bc7 in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #22 0x1316d5ab in MatConvert_AIJ_HYPRE ??????at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #23 0x11e3b463 in MatConvert ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #24 0x14072213 in PCSetUp_HYPRE ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #25 0x1276a9db in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private #26 0x127d923b in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #27 0x127e033f in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #28 0x127e6f07 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #29 0x1280d70b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #30 0x1140858f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3130 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #31 0x119faddf in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #32 0x1196c15f in fds ??????at ../../Source/main.f90:688 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 #33 0x11a126f3 in main ??????at ../../Source/main.f90:6 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited on signal 6 (Aborted). -------------------------------------------------------------------------- Seems the issue stems from the call to KSPSOLVE, line 3130 in fds/Source/pres.f90. Well, thank you for taking the time to look at this and also let me know if these threads should be moved to the issue tracker, or other venue. Best, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 4:37 PM To: Vanella, Marcos (Fed) >; PETSc users list > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU I don't see a problem in the matrix assembly. If you point me to your repo and show me how to build it, I can try to reproduce. --Junchao Zhang On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as (I understand) is done in the ex60. The error is always the same, so it seems it is not related to ksp,pc. Indeed it seems to happen when trying to offload the Matrix to the GPU: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? ... #8 0x20003935fc6b in ??? #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec769b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efd6a3 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efd6a3 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm #13 0x11efd6a3 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efd6a3 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efd6a3 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11edb287 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11edb287 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11edfd1b in MatSeqAIJCUSPARSEGetIJ ... ... This is the piece of fortran code I have doing this within my Poisson solver: ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag blocks nonzeros per row to 5. CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL%NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS%A_H,PETSC_IERR) CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) DO IROW=1,ZSL%NUNKH_LOCAL DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) ! PETSC expects zero based indexes.1,Global I position (zero base),1,Global J position (zero base) CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1,ZSL%JD_MAT_H(JCOL,IROW)-1,& ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) ENDDO ENDDO CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), and add nonzero values one by one. I wonder if there is something related to this that the copying to GPU does not like. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 14, 2023 3:24 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Yeah, it looks like ex60 was run correctly. Double check your code again and if you still run into errors, we can try to reproduce on our end. Thanks. --Junchao Zhang On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The batch script for slurm submission, ex60.log and gpu stats files are attached. Nothing stands out as wrong to me but please have a look. I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. Thanks! Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 5:52 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Before digging into the details, could you try to run src/ksp/ksp/tests/ex60.c to make sure the environment is ok. The comment at the end shows how to run it test: requires: cuda suffix: 1_cuda nsize: 4 args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse --Junchao Zhang On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for the info. I compiled the main branch of PETSc in another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain and don't see the fortran compilation error. It might have been related to gcc-9.3. I tried the case again, 2 CPUs and one GPU and get this error now: terminate called after throwing an instance of 'thrust::system::system_error' terminate called after throwing an instance of 'thrust::system::system_error' what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument Program received signal SIGABRT: Process abort signal. Backtrace for this error: Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #0 0x2000397fcd8f in ??? #1 0x2000397fb657 in ??? #2 0x2000000604d7 in ??? #2 0x2000000604d7 in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #3 0x200039cb9628 in ??? #4 0x200039c93eb3 in ??? #5 0x200039364a97 in ??? #6 0x20003935f6d3 in ??? #7 0x20003935f78f in ??? #8 0x20003935fc6b in ??? #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 #10 0x11ec425b in _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 #11 0x11efa263 in _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 ??????at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 #12 0x11efa263 in _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 #13 0x11efa263 in _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ ??????at /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #14 0x11efa263 in _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm ??????at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 #15 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #16 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 #17 0x11efa263 in _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em ??????at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 #18 0x11ed7e47 in _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em ??????at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2488 #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4696 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:251 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #22 0x133f141f in MatMPIAIJGetLocalMatMerge ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #24 0x1377e1df in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #25 0x11e4dd1f in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #26 0x130d792f in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #27 0x130db89b in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #28 0x130bf5a3 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #29 0x141518ff in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #30 0x13b3a43f in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #31 0x1276845b in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #32 0x127d6cbb in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #33 0x127dddbf in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #34 0x127e4987 in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #35 0x1280b18b in kspsolve_ ??????at /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #36 0x1140945f in __globmat_solver_MOD_glmat_solver ??????at ../../Source/pres.f90:3128 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #37 0x119f8853 in pressure_iteration_scheme ??????at ../../Source/main.f90:1449 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #38 0x11969bd3 in fds ??????at ../../Source/main.f90:688 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 #39 0x11a10167 in main ??????at ../../Source/main.f90:6 srun: error: enki12: tasks 0-1: Aborted (core dumped) This was the slurm submission script in this case: #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=debug #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 # PETSc dir and arch: export PETSC_DIR=/home/mnv/Software/petsc export PETSC_ARCH=arch-linux-c-dbg # SYSTEM name: export MYSYSTEM=enki # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg The configure.log for the PETSc build is attached. Another clue to what is happening is that even setting the matrices/vectors to be mpi (-vec_type mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : 0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: GPU error [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: GPU error [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command line [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad GIT Date: 2023-08-11 15:13:02 +0000 [0]PETSC ERROR: /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes --with-shared-libraries=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda ... I would have expected not to see GPU errors being printed out, given I did not request cuda matrix/vectors. The case run anyways, I assume it defaulted to the CPU solver. Let me know if you have any ideas as to what is happening. Thanks, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:35 PM To: Vanella, Marcos (Fed) >; PETSc users list >; Satish Balay > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Marcos, We do not have good petsc/gpu documentation, but see https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: cuda" in petsc tests and you will find examples using GPU. For the Fortran compile errors, attach your configure.log and Satish (Cc'ed) or others should know how to fix them. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thanks for the explanation. Is there some development documentation on the GPU work? I'm interested learning about it. I checked out the main branch and configured petsc. when compiling with gcc/gfortran I come across this error: .... CUDAC arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o CUDAC.dep arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) | 1 Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit interface /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: 38 | import tIS | 1 Error: IMPORT statement at (1) only permitted in an INTERFACE body /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: 39 | PetscInt a ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: 40 | PetscInt b ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: 41 | PetscInt c ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: 42 | PetscInt d ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: 43 | PetscInt e ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: 44 | PetscInt f ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: 45 | PetscInt g ! PetscInt | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: 46 | IS h ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: 47 | IS i ! IS | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: 48 | PetscErrorCode z | 1 Error: Unexpected data declaration statement in INTERFACE block at (1) /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: 49 | end subroutine PCASMCreateSubdomains2D | 1 Error: Expecting END INTERFACE statement at (1) make[3]: *** [gmakefile:225: arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 make[3]: *** Waiting for unfinished jobs.... CC arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o CUDAC arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o CUDAC.dep arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o make[3]: Leaving directory '/home/mnv/Software/petsc' make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 make[2]: Leaving directory '/home/mnv/Software/petsc' **************************ERROR************************************* Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** make[1]: *** [makefile:45: all] Error 1 make: *** [GNUmakefile:9: all] Error 2 ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 3:04 PM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. We recently refactored the COO code and got rid of that function. So could you try petsc/main? We map MPI processes to GPUs in a round-robin fashion. We query the number of visible CUDA devices (g), and assign the device (rank%g) to the MPI process (rank). In that sense, the work distribution is totally determined by your MPI work partition (i.e, yourself). On clusters, this MPI process to GPU binding is usually done by the job scheduler like slurm. You need to check your cluster's users' guide to see how to bind MPI processes to GPUs. If the job scheduler has done that, the number of visible CUDA devices to a process might just appear to be 1, making petsc's own mapping void. Thanks. --Junchao Zhang On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, thank you for replying. I compiled petsc in debug mode and this is what I get for the case: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: #0 0x15264731ead0 in ??? #1 0x15264731dc35 in ??? #2 0x15264711551f in ??? #3 0x152647169a7c in ??? #4 0x152647115475 in ??? #5 0x1526470fb7f2 in ??? #6 0x152647678bbd in ??? #7 0x15264768424b in ??? #8 0x1526476842b6 in ??? #9 0x152647684517 in ??? #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc ??????at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 #11 0x55bb46342ebb in _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 #12 0x55bb46342ebb in _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 #13 0x55bb46342ebb in _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ ??????at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 #14 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:115 #15 0x55bb46317bc5 in _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ ??????at /usr/local/cuda/include/thrust/detail/sort.inl:305 #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:4452 #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:173 #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:222 #19 0x55bb468e01cf in MatSetPreallocationCOO ??????at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND ??????at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 #21 0x55bb469015e5 in MatProductSymbolic ??????at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 #22 0x55bb4694ade2 in MatPtAP ??????at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 #24 0x55bb4696eb67 in MatCoarsenApply_MISK ??????at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 #25 0x55bb4695bd91 in MatCoarsenApply ??????at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 #26 0x55bb478294d8 in PCGAMGCoarsen_AGG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 #27 0x55bb471d1cb4 in PCSetUp_GAMG ??????at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 #28 0x55bb464022cf in PCSetUp ??????at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 #29 0x55bb4718b8a7 in KSPSetUp ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 #30 0x55bb4718f22e in KSPSolve_Private ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 #31 0x55bb47192c0c in KSPSolve ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 #32 0x55bb463efd35 in kspsolve_ ??????at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 #33 0x55bb45e94b32 in ??? #34 0x55bb46048044 in ??? #35 0x55bb46052ea1 in ??? #36 0x55bb45ac5f8e in ??? #37 0x1526470fcd8f in ??? #38 0x1526470fce3f in ??? #39 0x55bb45aef55d in ??? #40 0xffffffffffffffff in ??? -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -------------------------------------------------------------------------- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos ________________________________ From: Junchao Zhang > Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Aug 15 11:54:07 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 15 Aug 2023 11:54:07 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: On Tue, Aug 15, 2023 at 11:35?AM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, you are correct. Turns out I had the cuda library path being > defined in the complicated LFLAGS_PETSC pointing to cuda12 instead of 11.7, > which was mixing the cuda versions, giving this weird error. The idea > behind the long form of the LDFLAGS is to statically link most of the libs > in the future (using also --with-shared-libraries=0 in the PETSc config), > but we need to work on this. Ideally we want to end up with a self > contained bundle for fds. Not clear that is even possible when adding cuda > into the mix. > The petsc build system can handle all these complexities of flags in Windows and Linux with shared or static libraries. You can leverage petsc makefiles, see https://petsc.org/release/manual/getting_started/#sec-writing-application-codes > > Your suggestion for the simple version of LFLAGS_PETSC also works fine and > we'll be using it in our scaling tests. > Thank you for your time, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Tuesday, August 15, 2023 11:54 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > > > > > On Tue, Aug 15, 2023 at 9:57?AM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > I see. I'm trying to get hypre (or any preconditioner) to run on the GPU > and this is what is giving me issues. I can run cases with the CPU only > version of PETSc without problems. > > I tried running the job both in an interactive session and through slurm > with the --with-cuda configured PETSc and passing the cuda vector flag at > runtime like you did: > > $ mpirun -n 2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view -mat_type aijcusparse -vec_type cuda > > and still get the error. So provided we both configured PETSc in the same > way I'm thinking there is something going on with the configuration of my > cluster. > > Even without defining the "-mat_type aijcusparse -vec_type cuda" flags in > the submission line I get the same "parallel_for failed: > cudaErrorInvalidConfiguration: invalid configuration argument" error > instead of what you see ("You need to enable PETSc device support"). > > I noted you use a Kokkos version of PETSc, is this related to your > development? > > No, Kokkos is irrelevant here. Were you able to compile your code with > the much simpler LFLAGS_PETSC? > LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib > -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc > > > Thank you, > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Tuesday, August 15, 2023 9:59 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > > > > On Tue, Aug 15, 2023 at 8:55?AM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for your observations and taking the time to look at > this. So if I don't configure PETSc with the --with-cuda flag and still > select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I > thought I needed that flag to get the solvers to run on the V100 card. > > No, to have hypre run on CPU, you need to configure petsc/hypre without > --with-cuda; otherwise, you need --with-cuda and have to always use flags > like -vec_type cuda etc. I admit this is not user-friendly and should be > fixed by petsc and hypre developers. > > > I'll remove the hardwired paths on the link flags, thanks for that! > > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 7:01 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > These are my findings. I successfully ran the test in the end. > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view > Starting FDS ... > ... > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to > enable PETSc device support, for example, in some cases, -vec_type cuda > > Now I get why you met errors with "CPU runs". You configured and built > hypre with petsc. Since you added --with-cuda, petsc would configure hypre > with its GPU support. However, hypre has a limit/shortcoming that if it is > configured with GPU support, you must pass GPU vectors to it. Thus the > error. In other words, if you remove --with-cuda, you should be able to run > above command. > > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type > aijcusparse -vec_type cuda > > Starting FDS ... > > MPI Process 0 started on hong-gce-workstation > MPI Process 1 started on hong-gce-workstation > > Reading FDS input file ... > > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > .. > > Fire Dynamics Simulator > > ... > STOP: FDS completed successfully (CHID: test) > > I guess there were link problems in your makefile. Actually, in the first > try, I failed with > > mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 > 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" > -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC > -I"/home/jczhang/petsc/include/" > -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o > fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o > func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o > part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o > init.o dump.o read.o divg.o main.o -Wl,-rpath > -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags > -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi > -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib > -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack > -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 > -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ > -L/usr/lib64 -lX11 > /usr/bin/ld: cannot find -lflapack: No such file or directory > /usr/bin/ld: cannot find -lfblas: No such file or directory > collect2: error: ld returned 1 exit status > make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 > > That is because you hardwired many link flags in your fds/Build/makefile. > Then I changed LFLAGS_PETSC to > LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib > -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc > > and everything worked. Could you also try it? > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Attached is the test.fds test case. Thanks! > ------------------------------ > *From:* Vanella, Marcos (Fed) > *Sent:* Monday, August 14, 2023 5:45 PM > *To:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Satish Balay > *Cc:* McDermott, Randall J. (Fed) > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > All right Junchao, thank you for looking at this! > > So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc > env variables: > > # PETSc dir and arch, set MYSYS to nisaba dor FDS: > export PETSC_DIR=/dir_to_petsc/petsc > export PETSC_ARCH=arch-linux-c-dbg > export MYSYSTEM=nisaba > > and configured the library with: > > $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" > FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes > --with-shared-libraries=0 --download-suitesparse --download-hypre > --download-fblaslapack --with-cuda > > Then made and checked the PETSc build. > > Then for FDS: > > 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX > branch: > > $ cd ~/fds_dir > $ git clone https://github.com/marcosvanella/fds.git > $ cd fds > $ git checkout FireX > > > 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a > debug target for fds (this is with cuda enabled openmpi compiled with gcc, > in my case gcc-11.2 + PETSc): > > $ cd Build/ompi_gnu_linux_db > $./make_fds.sh > > You should see compilation lines like this, with the WITH_PETSC > Preprocessor variable being defined: > > Building ompi_gnu_linux_db > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 > ... > ... > > If you are compiling on a Power9 node you might come across this error > right off the bat: > > ../../Source/prec.f90:34:8: > > 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A > very small number 16 byte accuracy > | 1 > Error: Kind -3 not supported for type REAL at (1) > > which means for some reason gcc in the Power9 does not like quad precision > definition in this manner. A way around it is to add the intrinsic > Fortran2008 module iso_fortran_env: > > use, intrinsic :: iso_fortran_env > > in the fds/Source/prec.f90 file and change the quad precision denominator > to: > > INTEGER, PARAMETER :: QB = REAL128 > > in there. We are investigating the reason why this is happening. This is > not related to Petsc in the code, everything related to PETSc calls is > integers and double precision reals. > > After the code compiles you get the executable in > ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > > With which you can run the attached 2 mesh case as: > > $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view > > and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE > which is what I was testing in CPU. This is the result I get from the > previous submission in an interactive job in Enki (similar with batch > submissions, gmres ksp, gamg pc): > > > Starting FDS ... > > MPI Process 1 started on enki11.adlp > MPI Process 0 started on enki11.adlp > > Reading FDS input file ... > > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: DEVC Device is not within any mesh. > > Fire Dynamics Simulator > > Current Date : August 14, 2023 17:26:22 > Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX > Revision Date : Mon Aug 14 17:07:20 2023 -0400 > Compiler : Gnu gfortran 11.2.1 > Compilation Date : Aug 14, 2023 17:11:05 > > MPI Enabled; Number of MPI Processes: 2 > OpenMP Enabled; Number of OpenMP Threads: 1 > > MPI version: 3.1 > MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp > Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 > > Job TITLE : > Job ID string : test > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > *#19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #*19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > Seems the issue stems from the call to KSPSOLVE, line 3130 in > fds/Source/pres.f90. > > Well, thank you for taking the time to look at this and also let me know > if these threads should be moved to the issue tracker, or other venue. > Best, > Marcos > > > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 4:37 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > I don't see a problem in the matrix assembly. > If you point me to your repo and show me how to build it, I can try to > reproduce. > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type > asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as > (I understand) is done in the ex60. The error is always the same, so it > seems it is not related to ksp,pc. Indeed it seems to happen when trying to > offload the Matrix to the GPU: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > ... > #8 0x20003935fc6b in ??? > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11edb287 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ* > ... > ... > > This is the piece of fortran code I have doing this within my Poisson > solver: > > ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag > blocks nonzeros per row to 5. > CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL% > NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& > 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS% > A_H,PETSC_IERR) > CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) > DO IROW=1,ZSL%NUNKH_LOCAL > DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) > ! PETSC expects zero based indexes.1,Global I position (zero > base),1,Global J position (zero base) > CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1 > ,ZSL%JD_MAT_H(JCOL,IROW)-1,& > ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) > ENDDO > ENDDO > CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > > Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), > and add nonzero values one by one. I wonder if there is something related > to this that the copying to GPU does not like. > Thanks, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 3:24 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Yeah, it looks like ex60 was run correctly. > Double check your code again and if you still run into errors, we can try > to reproduce on our end. > > Thanks. > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Aug 15 12:45:32 2023 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 15 Aug 2023 13:45:32 -0400 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> Message-ID: Hi Stephan, I have a branch that you can try: adams/gamg-add-old-coarsening Things to test: * First, verify that nothing unintended changed by reproducing your bad results with this branch (the defaults are the same) * Try not using the minimum degree ordering that I suggested with: -pc_gamg_use_minimum_degree_ordering false -- I am eager to see if that is the main problem. * Go back to what I think is the old method: -pc_gamg_use_minimum_degree_ordering false -pc_gamg_use_aggressive_square_graph true When we get back to where you were, I would like to try to get modern stuff working. I did add a -pc_gamg_aggressive_mis_k <2> You could to another step of MIS coarsening with -pc_gamg_aggressive_mis_k 3 Anyway, lots to look at but, alas, AMG does have a lot of parameters. Thanks, Mark On Mon, Aug 14, 2023 at 4:26?PM Mark Adams wrote: > > > On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer > wrote: > >> Many thanks for looking into this, Mark >> > My 3D tests were not that different and I see you lowered the threshold. >> > Note, you can set the threshold to zero, but your test is running so >> much >> > differently than mine there is something else going on. >> > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for >> > in 3D. >> > >> > So it is not clear what the problem is. Some questions: >> > >> > * do you have a picture of this mesh to show me? >> >> It's just a standard hexahedral cubed sphere mesh with the refinement >> level giving the number of times each of the six sides have been >> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 >> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 = >> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) going >> to the next Level >> > > I see, and I assume these are pretty stretched elements. > > >> >> > * what do you mean by Q1-Q2 elements? >> >> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity >> and (tri)linear for pressure >> >> I guess you could argue we could/should just do good old geometric >> multigrid instead. More generally we do use this solver configuration a >> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our >> adaptive mesh runs - would it be worth to see if we have the same >> performance issues with tetrahedral P2-P1? >> > > No, you have a clear reproducer, if not minimal. > The first coarsening is very different. > > I am working on this and I see that I added a heuristic for thin bodies > where you order the vertices in greedy algorithms with minimum degree first. > This will tend to pick corners first, edges then faces, etc. > That may be the problem. I would like to understand it better (see below). > > > >> > >> > It would be nice to see if the new and old codes are similar without >> > aggressive coarsening. >> > This was the intended change of the major change in this time frame as >> you >> > noticed. >> > If these jobs are easy to run, could you check that the old and new >> > versions are similar with "-pc_gamg_square_graph 0 ", ( and you only >> need >> > one time step). >> > All you need to do is check that the first coarse grid has about the >> same >> > number of equations (large). >> Unfortunately we're seeing some memory errors when we use this option, >> and I'm not entirely clear whether we're just running out of memory and >> need to put it on a special queue. >> >> The run with square_graph 0 using new PETSc managed to get through one >> solve at level 5, and is giving the following mg levels: >> >> rows=174, cols=174, bs=6 >> total: nonzeros=30276, allocated nonzeros=30276 >> -- >> rows=2106, cols=2106, bs=6 >> total: nonzeros=4238532, allocated nonzeros=4238532 >> -- >> rows=21828, cols=21828, bs=6 >> total: nonzeros=62588232, allocated nonzeros=62588232 >> -- >> rows=589824, cols=589824, bs=6 >> total: nonzeros=1082528928, allocated nonzeros=1082528928 >> -- >> rows=2433222, cols=2433222, bs=3 >> total: nonzeros=456526098, allocated nonzeros=456526098 >> >> comparing with square_graph 100 with new PETSc >> >> rows=96, cols=96, bs=6 >> total: nonzeros=9216, allocated nonzeros=9216 >> -- >> rows=1440, cols=1440, bs=6 >> total: nonzeros=647856, allocated nonzeros=647856 >> -- >> rows=97242, cols=97242, bs=6 >> total: nonzeros=65656836, allocated nonzeros=65656836 >> -- >> rows=2433222, cols=2433222, bs=3 >> total: nonzeros=456526098, allocated nonzeros=456526098 >> >> and old PETSc with square_graph 100 >> >> rows=90, cols=90, bs=6 >> total: nonzeros=8100, allocated nonzeros=8100 >> -- >> rows=1872, cols=1872, bs=6 >> total: nonzeros=1234080, allocated nonzeros=1234080 >> -- >> rows=47652, cols=47652, bs=6 >> total: nonzeros=23343264, allocated nonzeros=23343264 >> -- >> rows=2433222, cols=2433222, bs=3 >> total: nonzeros=456526098, allocated nonzeros=456526098 >> -- >> >> Unfortunately old PETSc with square_graph 0 did not complete a single >> solve before giving the memory error >> > > OK, thanks for trying. > > I am working on this and I will give you a branch to test, but if you can > rebuild PETSc here is a quick test that might fix your problem. > In src/ksp/pc/impls/gamg/agg.c you will see: > > PetscCall(PetscSortIntWithArray(nloc, degree, permute)); > > If you can comment this out in the new code and compare with the old, that > might fix the problem. > > Thanks, > Mark > > >> >> > >> > BTW, I am starting to think I should add the old method back as an >> option. >> > I did not think this change would cause large differences. >> >> Yes, I think that would be much appreciated. Let us know if we can do >> any testing >> >> Best wishes >> Stephan >> >> >> > >> > Thanks, >> > Mark >> > >> > >> > >> > >> >> Note that we are providing the rigid body near nullspace, >> >> hence the bs=3 to bs=6. >> >> We have tried different values for the gamg_threshold but it doesn't >> >> really seem to significantly alter the coarsening amount in that first >> >> step. >> >> >> >> Do you have any suggestions for further things we should try/look at? >> >> Any feedback would be much appreciated >> >> >> >> Best wishes >> >> Stephan Kramer >> >> >> >> Full logs including log_view timings available from >> >> https://github.com/stephankramer/petsc-scaling/ >> >> >> >> In particular: >> >> >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Aug 15 13:25:28 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 15 Aug 2023 13:25:28 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: On Tue, Aug 15, 2023 at 1:14?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Thanks Junchao! I'll take a look at this. > I'm going to do some tests on Enki and then will move to summit to do > larger scaling tests. If you have any suggestion on solver-preconditioner > combination for this poisson equation please let me know. > Math questions go to Jed and Matt :) > Goes without saying, next time I'm in Chicago I'll get in touch and beer > is on me! > Thanks > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Tuesday, August 15, 2023 12:54 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > > > > On Tue, Aug 15, 2023 at 11:35?AM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, you are correct. Turns out I had the cuda library path being > defined in the complicated LFLAGS_PETSC pointing to cuda12 instead of 11.7, > which was mixing the cuda versions, giving this weird error. The idea > behind the long form of the LDFLAGS is to statically link most of the libs > in the future (using also --with-shared-libraries=0 in the PETSc config), > but we need to work on this. Ideally we want to end up with a self > contained bundle for fds. Not clear that is even possible when adding cuda > into the mix. > > The petsc build system can handle all these complexities of flags in > Windows and Linux with shared or static libraries. You can leverage petsc > makefiles, see > https://petsc.org/release/manual/getting_started/#sec-writing-application-codes > > > > > Your suggestion for the simple version of LFLAGS_PETSC also works fine and > we'll be using it in our scaling tests. > Thank you for your time, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Tuesday, August 15, 2023 11:54 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > > > > > On Tue, Aug 15, 2023 at 9:57?AM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > I see. I'm trying to get hypre (or any preconditioner) to run on the GPU > and this is what is giving me issues. I can run cases with the CPU only > version of PETSc without problems. > > I tried running the job both in an interactive session and through slurm > with the --with-cuda configured PETSc and passing the cuda vector flag at > runtime like you did: > > $ mpirun -n 2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view -mat_type aijcusparse -vec_type cuda > > and still get the error. So provided we both configured PETSc in the same > way I'm thinking there is something going on with the configuration of my > cluster. > > Even without defining the "-mat_type aijcusparse -vec_type cuda" flags in > the submission line I get the same "parallel_for failed: > cudaErrorInvalidConfiguration: invalid configuration argument" error > instead of what you see ("You need to enable PETSc device support"). > > I noted you use a Kokkos version of PETSc, is this related to your > development? > > No, Kokkos is irrelevant here. Were you able to compile your code with > the much simpler LFLAGS_PETSC? > LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib > -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc > > > Thank you, > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Tuesday, August 15, 2023 9:59 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > > > > On Tue, Aug 15, 2023 at 8:55?AM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for your observations and taking the time to look at > this. So if I don't configure PETSc with the --with-cuda flag and still > select HYPRE as the preconditioner, I still get Hypre to run on the GPU? I > thought I needed that flag to get the solvers to run on the V100 card. > > No, to have hypre run on CPU, you need to configure petsc/hypre without > --with-cuda; otherwise, you need --with-cuda and have to always use flags > like -vec_type cuda etc. I admit this is not user-friendly and should be > fixed by petsc and hypre developers. > > > I'll remove the hardwired paths on the link flags, thanks for that! > > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 7:01 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov ; Satish Balay < > balay at mcs.anl.gov>; McDermott, Randall J. (Fed) < > randall.mcdermott at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > These are my findings. I successfully ran the test in the end. > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view > Starting FDS ... > ... > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: HYPRE_MEMORY_DEVICE expects a device vector. You need to > enable PETSc device support, for example, in some cases, -vec_type cuda > > Now I get why you met errors with "CPU runs". You configured and built > hypre with petsc. Since you added --with-cuda, petsc would configure hypre > with its GPU support. However, hypre has a limit/shortcoming that if it is > configured with GPU support, you must pass GPU vectors to it. Thus the > error. In other words, if you remove --with-cuda, you should be able to run > above command. > > > $ mpirun -n 2 ./fds_ompi_gnu_linux_db test.fds -log_view -mat_type > aijcusparse -vec_type cuda > > Starting FDS ... > > MPI Process 0 started on hong-gce-workstation > MPI Process 1 started on hong-gce-workstation > > Reading FDS input file ... > > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > .. > > Fire Dynamics Simulator > > ... > STOP: FDS completed successfully (CHID: test) > > I guess there were link problems in your makefile. Actually, in the first > try, I failed with > > mpifort -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS6.7.0-11263-g04d5df7-FireX\" -DGITDATE_PP=\""Mon Aug 14 > 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:32:12\"" > -DCOMPVER_PP=\""Gnu gfortran 11.4.0-1ubuntu1~22.04)"\" -DWITH_PETSC > -I"/home/jczhang/petsc/include/" > -I"/home/jczhang/petsc/arch-kokkos-dbg/include" -fopenmp -o > fds_ompi_gnu_linux_db prec.o cons.o prop.o devc.o type.o data.o mesh.o > func.o gsmv.o smvv.o rcal.o turb.o soot.o pois.o geom.o ccib.o radi.o > part.o vege.o ctrl.o hvac.o mass.o imkl.o wall.o fire.o velo.o pres.o > init.o dump.o read.o divg.o main.o -Wl,-rpath > -Wl,/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -Wl,--enable-new-dtags > -L/apps/ubuntu-20.04.2/openmpi/4.1.1/gcc-9.3.0/lib -lmpi > -Wl,-rpath,/home/jczhang/petsc/arch-kokkos-dbg/lib > -L/home/jczhang/petsc/arch-kokkos-dbg/lib -lpetsc -ldl -lspqr -lumfpack > -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig > -lHYPRE -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 > -L/usr/local/cuda/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas > -lcusparse -lcusolver -lcurand -lcuda -lflapack -lfblas -lstdc++ > -L/usr/lib64 -lX11 > /usr/bin/ld: cannot find -lflapack: No such file or directory > /usr/bin/ld: cannot find -lfblas: No such file or directory > collect2: error: ld returned 1 exit status > make: *** [../makefile:357: ompi_gnu_linux_db] Error 1 > > That is because you hardwired many link flags in your fds/Build/makefile. > Then I changed LFLAGS_PETSC to > LFLAGS_PETSC = -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib > -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc > > and everything worked. Could you also try it? > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 4:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Attached is the test.fds test case. Thanks! > ------------------------------ > *From:* Vanella, Marcos (Fed) > *Sent:* Monday, August 14, 2023 5:45 PM > *To:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov>; Satish Balay > *Cc:* McDermott, Randall J. (Fed) > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > All right Junchao, thank you for looking at this! > > So, I checked out the /dir_to_petsc/petsc/main branch, setup the petsc > env variables: > > # PETSc dir and arch, set MYSYS to nisaba dor FDS: > export PETSC_DIR=/dir_to_petsc/petsc > export PETSC_ARCH=arch-linux-c-dbg > export MYSYSTEM=nisaba > > and configured the library with: > > $ ./Configure COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" FOPTFLAGS="-g -O2" > FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" --with-debugging=yes > --with-shared-libraries=0 --download-suitesparse --download-hypre > --download-fblaslapack --with-cuda > > Then made and checked the PETSc build. > > Then for FDS: > > 1. Clone my fds repo in a ~/fds_dir you make, and checkout the FireX > branch: > > $ cd ~/fds_dir > $ git clone https://github.com/marcosvanella/fds.git > $ cd fds > $ git checkout FireX > > > 1. With PETSC_DIR, PETSC_ARCH and MYSYSTEM=nisaba defined, compile a > debug target for fds (this is with cuda enabled openmpi compiled with gcc, > in my case gcc-11.2 + PETSc): > > $ cd Build/ompi_gnu_linux_db > $./make_fds.sh > > You should see compilation lines like this, with the WITH_PETSC > Preprocessor variable being defined: > > Building ompi_gnu_linux_db > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prec.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/cons.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/prop.f90 > mpifort -c -m64 -O0 -std=f2018 -ggdb -Wall -Wunused-parameter > -Wcharacter-truncation -Wno-target-lifetime -fcheck=all -fbacktrace > -ffpe-trap=invalid,zero,overflow -frecursive -ffpe-summary=none > -fall-intrinsics -fbounds-check -cpp > -DGITHASH_PP=\"FDS-6.8.0-556-g04d5df7-dirty-FireX\" -DGITDATE_PP=\""Mon Aug > 14 17:07:20 2023 -0400\"" -DBUILDDATE_PP=\""Aug 14, 2023 17:34:36\"" > -DCOMPVER_PP=\""Gnu gfortran 11.2.1"\" *-DWITH_PETSC* > -I"/home/mnv/Software/petsc/include/" > -I"/home/mnv/Software/petsc/arch-linux-c-dbg/include" ../../Source/devc.f90 > ... > ... > > If you are compiling on a Power9 node you might come across this error > right off the bat: > > ../../Source/prec.f90:34:8: > > 34 | REAL(QB), PARAMETER :: TWO_EPSILON_QB=2._QB*EPSILON(1._QB) !< A > very small number 16 byte accuracy > | 1 > Error: Kind -3 not supported for type REAL at (1) > > which means for some reason gcc in the Power9 does not like quad precision > definition in this manner. A way around it is to add the intrinsic > Fortran2008 module iso_fortran_env: > > use, intrinsic :: iso_fortran_env > > in the fds/Source/prec.f90 file and change the quad precision denominator > to: > > INTEGER, PARAMETER :: QB = REAL128 > > in there. We are investigating the reason why this is happening. This is > not related to Petsc in the code, everything related to PETSc calls is > integers and double precision reals. > > After the code compiles you get the executable in > ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > > With which you can run the attached 2 mesh case as: > > $ mpirun -n 2 ~/fds_dir/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -log_view > > and change PETSc ksp, pc runtime flags, etc. The default is PCG + HYPRE > which is what I was testing in CPU. This is the result I get from the > previous submission in an interactive job in Enki (similar with batch > submissions, gmres ksp, gamg pc): > > > Starting FDS ... > > MPI Process 1 started on enki11.adlp > MPI Process 0 started on enki11.adlp > > Reading FDS input file ... > > WARNING: SPEC REAC_FUEL is not in the table of pre-defined species. Any > unassigned SPEC variables in the input were assigned the properties of > nitrogen. > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3014 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > At line 3461 of file ../../Source/read.f90 > Fortran runtime warning: An array temporary was created > WARNING: DEVC Device is not within any mesh. > > Fire Dynamics Simulator > > Current Date : August 14, 2023 17:26:22 > Revision : FDS6.7.0-11263-g04d5df7-dirty-FireX > Revision Date : Mon Aug 14 17:07:20 2023 -0400 > Compiler : Gnu gfortran 11.2.1 > Compilation Date : Aug 14, 2023 17:11:05 > > MPI Enabled; Number of MPI Processes: 2 > OpenMP Enabled; Number of OpenMP Threads: 1 > > MPI version: 3.1 > MPI library version: Open MPI v4.1.4, package: Open MPI xng4 at enki01.adlp > Distribution, ident: 4.1.4, repo rev: v4.1.4, May 26, 2022 > > Job TITLE : > Job ID string : test > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #9 0x11ec67db in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec67db in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efc7e3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #12 0x11efc7e3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #13 0x11efc7e3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efc7e3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efc7e3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > *#19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #18 0x11eda3c7 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #*19 0x11eda3c7 in MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > *#20 0x11edc6b7 in MatSetPreallocationCOO_SeqAIJCUSPARSE* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4300 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #21 0x11e91bc7 in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:650 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #22 0x1316d5ab in MatConvert_AIJ_HYPRE > at /home/mnv/Software/petsc/src/mat/impls/hypre/mhypre.c:648 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #23 0x11e3b463 in MatConvert > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:4428 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #24 0x14072213 in PCSetUp_HYPRE > at /home/mnv/Software/petsc/src/ksp/pc/impls/hypre/hypre.c:254 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #25 0x1276a9db in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > #26 0x127d923b in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #27 0x127e033f in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #28 0x127e6f07 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #29 0x1280d70b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #30 0x1140858f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3130 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #31 0x119faddf in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #32 0x1196c15f in fds > at ../../Source/main.f90:688 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > #33 0x11a126f3 in main > at ../../Source/main.f90:6 > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 3028180 on node enki11 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > Seems the issue stems from the call to KSPSOLVE, line 3130 in > fds/Source/pres.f90. > > Well, thank you for taking the time to look at this and also let me know > if these threads should be moved to the issue tracker, or other venue. > Best, > Marcos > > > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 4:37 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > I don't see a problem in the matrix assembly. > If you point me to your repo and show me how to build it, I can try to > reproduce. > > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 2:53?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type > asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as > (I understand) is done in the ex60. The error is always the same, so it > seems it is not related to ksp,pc. Indeed it seems to happen when trying to > offload the Matrix to the GPU: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > ... > #8 0x20003935fc6b in ??? > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11edb287 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ* > ... > ... > > This is the piece of fortran code I have doing this within my Poisson > solver: > > ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag > blocks nonzeros per row to 5. > CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL% > NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& > 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS% > A_H,PETSC_IERR) > CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) > DO IROW=1,ZSL%NUNKH_LOCAL > DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) > ! PETSC expects zero based indexes.1,Global I position (zero > base),1,Global J position (zero base) > CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1 > ,ZSL%JD_MAT_H(JCOL,IROW)-1,& > ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) > ENDDO > ENDDO > CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > > Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), > and add nonzero values one by one. I wonder if there is something related > to this that the copying to GPU does not like. > Thanks, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 14, 2023 3:24 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Yeah, it looks like ex60 was run correctly. > Double check your code again and if you still run into errors, we can try > to reproduce on our end. > > Thanks. > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 1:05?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Satish Balay < > balay at mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) ; PETSc users list < > petsc-users at mcs.anl.gov>; Satish Balay > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ?pcasmcreatesubdomains2d? at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Tue Aug 15 15:37:05 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Tue, 15 Aug 2023 23:37:05 +0300 Subject: [petsc-users] issue with multiple versions of mpi Message-ID: I was earlier using petsc with real-built, i tried configuring it for a complex environment using the same procedure (at least I think so) but with the addition of '--with-scalar-type=complex'. There appears to be no problem during the configuration process and its checks ('*configure.log*' is attached). But while compiling code, I ran into an error message concerning conflicts due to multiple versions of mpi. How do I resolve this? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- /bin/ld: warning: libmpi.so.12, needed by /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so, may conflict with libmpi.so.40 /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_comm_create_multiplex' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_create_indexed_block_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Allreduce_enqueue_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Allreduce_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_recv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Gather_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_set_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Intercomm_create_from_groups' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_get_num_psets' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stop_progress_thread' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Gather_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Issend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Igather_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allgatherv_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Pack_external_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Rput_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Pready' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_progress' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Start_progress_thread' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Isendrecv_replace_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_recv' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoallw_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Unpack_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_send_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Stream_free' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_GPU_query_support' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Iallgather_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_get_info' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Barrier_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_get_envelope_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ireduce_scatter_block_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ibsend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allgatherv_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Bsend_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Delete_error_string' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_isend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Exscan_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Buffer_detach_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Rsend_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Delete_error_code' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_allgatherv_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Win_get_attr' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_get_nth_pset' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Gather_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Accumulate_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_create_hindexed_block_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Query_ze_support' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoallv_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_create_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_get_pset_info' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Delete_error_string' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Igatherv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Win_allocate_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_indexed_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allreduce_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Irecv_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Query_ze_support' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoall_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Wait_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Rget_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Imrecv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Isend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_get_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Type_set_attr' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoallw_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Get_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scatter_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Precv_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoallw_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scan_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Parrived' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_irecv' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Bcast_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Register_datarep_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_scatter_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ssend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scatter_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_create_hindexed_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Unpack_external_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_contiguous_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Pready' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Wait_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_allgatherv_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Delete_error_code' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Stream_comm_create_multiplex' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Group_from_session_pset' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Comm_get_failed' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Send_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Exscan_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Comm_set_attr' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Query_cuda_support' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Irecv_enqueue_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Iscan_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoall_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Rsend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Info_create_env' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ialltoallw_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ineighbor_alltoall_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Bcast_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scatter_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Stream_create' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_irecv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allgatherv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Recv_enqueue_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_set_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Grequest_set_lang_f77' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Isendrecv_replace' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Get_count_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scatterv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoallv_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Gatherv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Bsend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scatterv_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_create_hvector_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Waitall_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Stream_comm_create' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Bcast_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Comm_idup_with_info' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Parrived' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_create_struct_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Recv_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_allgatherv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Raccumulate_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Intercomm_create_from_groups' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ineighbor_alltoallw_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoallv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scan_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_get_num_psets' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoall_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Rget_accumulate_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Win_create_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_call_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_GPU_query_support' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Win_set_attr' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Comm_get_failed' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Info_get_string' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allreduce_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_scatter_block_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Recv_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Pready_range' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Buffer_attach_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Info_get_string' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Waitall_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Pack_external_size_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_call_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allgather_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_create_subarray_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoallv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoall_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ssend_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ireduce_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Get_accumulate_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_send' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Keyval_set_proxy' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_scatter_block_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Irsend_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Type_get_attr' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allgather_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Start_progress_thread' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoallw_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Put_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_scatter_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allreduce_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Get_elements_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Pack_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scatterv_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_get_contents_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Query_hip_support' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_allgather_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_scatter_block_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_create_darray_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Status_set_elements_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Stop_progress_thread' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Op_create_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_allgather_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Info_set_hex' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoall_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ineighbor_allgatherv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_get_nth_pset' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_vector_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoallw_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Send_enqueue_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ineighbor_alltoallv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_create' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_get_info' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Iexscan_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Exscan_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Isend_enqueue_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Stream_progress' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPII_Comm_get_attr' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Alltoallv_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Iscatterv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Delete_error_class' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Sendrecv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Isend_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Gatherv_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_local_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Isendrecv' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPIX_Comm_get_stream' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Recv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Psend_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_get_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Sendrecv_replace_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoallw_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Iscatter_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Isendrecv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Send_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Pready_list' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Comm_create_from_group' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Iallgatherv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Gatherv_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoallv_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Type_size_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_free' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_allgather_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Pack_size_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Scan_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Send_enqueue' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_finalize' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Group_from_session_pset' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Pready_range' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Allgather_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ibcast_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Neighbor_alltoall_init_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Mrecv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_get_pset_info' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Info_create_env' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ireduce_scatter_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Iallreduce_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Session_finalize' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Query_hip_support' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ineighbor_allgather_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Session_create_errhandler' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Comm_get_stream' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_comm_create' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Delete_error_class' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Comm_create_from_group' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Win_allocate_shared_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Comm_idup_with_info' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Barrier_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ialltoall_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Win_shared_query_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Irecv_c' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Reduce_scatter_init' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `PMPI_Pready_list' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPIX_Stream_isend' /home/maitri.ksh/Maitri/petsc/linux-gnu-c-debug/lib/libmpifort.so: undefined reference to `MPI_Ialltoallv_c' collect2: error: ld returned 1 exit status -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 10043930 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Aug 15 15:51:25 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 15 Aug 2023 15:51:25 -0500 (CDT) Subject: [petsc-users] issue with multiple versions of mpi In-Reply-To: References: Message-ID: <20ada345-a65c-33ba-a806-6b6a3a67c4a6@mcs.anl.gov> Do you get this error when you compile a PETSc example [with the corresponding PETSc makefile]? If not - you'll have to check the difference in compiler options between this example compile - and your application. Satish On Tue, 15 Aug 2023, maitri ksh wrote: > I was earlier using petsc with real-built, i tried configuring it for a > complex environment using the same procedure (at least I think so) but with > the addition of '--with-scalar-type=complex'. There appears to be no > problem during the configuration process and its checks ('*configure.log*' > is attached). But while compiling code, I ran into an error message > concerning conflicts due to multiple versions of mpi. How do I resolve > this? > From erik.kneller at yahoo.com Thu Aug 17 07:37:29 2023 From: erik.kneller at yahoo.com (Erik Kneller) Date: Thu, 17 Aug 2023 12:37:29 +0000 (UTC) Subject: [petsc-users] Filling non-zero values of a Petsc matrix using numpy arrays with non-zero indices and values (petsc4py) References: <729971208.496779.1692275849292.ref@mail.yahoo.com> Message-ID: <729971208.496779.1692275849292@mail.yahoo.com> Hi All, I need to fill non-zero values of a Petsc matrix via petsc4py for the domain defined by A.getOwnershipRange() using three Numpy arrays: (1) array containing row indices of non-zero value, (2) array containing column indices of non-zero values and (3) array containing the non-zero matrix values. How can one perform this type of filling operation in petsc4py? The method A.setValues does not appear to allow this since it only works on an individual matrix element or a block of matrix elements. I am using Numpy arrays since they can be computed in loops optimized using Numba on each processor. I also cannot pass the Petsc matrix to a Numba compiled function since type information cannot be inferred. I absolutely need to avoid looping in standard Python to define Petsc matrix elements due to performance issues. I also need to use a standard petscy4py method and avoid writing new C or Fortran wrappers to minimize language complexity. Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example:---------------------------------------------------------------------------------------------- comm = PETSc.COMM_WORLD rank = comm.getRank() dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and y-directions of the grid nnz_max = 5 # max number of non-zero values per row A = PETSc.Mat() A.create(comm=PETSc.COMM_WORLD) A.setSizes((xnodes*ynodes, xnodes*ynodes)) A.setType(PETSc.Mat.Type.AIJ)A.setPreallocationNNZ(nnz_max) rstart, rend = A.getOwnershipRange() # Here Anz, Arow and Acol are vectors with size equal to the number of non-zero values Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, nnz_max, dx, xnodes, ynodes) A.setValues(Arow, Acol, Anz) # <--- This does not work. A.assemblyBegin() A.assemblyEnd()ksp = PETSc.KSP() ksp.create(comm=A.getComm())ksp.setType(PETSc.KSP.Type.CG) ksp.getPC().setType(PETSc.PC.Type.GAMG)ksp.setOperators(A) ksp.setFromOptions()x, b = A.createVecs() b.set(1.0) ksp.solve(b, x) Regards,Erik Kneller, Ph.D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 17 16:57:23 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 18 Aug 2023 07:57:23 +1000 Subject: [petsc-users] Filling non-zero values of a Petsc matrix using numpy arrays with non-zero indices and values (petsc4py) In-Reply-To: <729971208.496779.1692275849292@mail.yahoo.com> References: <729971208.496779.1692275849292.ref@mail.yahoo.com> <729971208.496779.1692275849292@mail.yahoo.com> Message-ID: On Fri, Aug 18, 2023 at 12:49?AM Erik Kneller via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi All, > > I need to fill non-zero values of a Petsc matrix via petsc4py for the > domain defined by A.getOwnershipRange() using three Numpy arrays: (1) > array containing row indices of non-zero value, (2) array containing column > indices of non-zero values and (3) array containing the non-zero matrix > values. How can one perform this type of filling operation in petsc4py? > The method A.setValues does not appear to allow this since it only works on > an individual matrix element or a block of matrix elements. > > I am using Numpy arrays since they can be computed in loops optimized > using Numba on each processor. I also cannot pass the Petsc matrix to a > Numba compiled function since type information cannot be inferred. I > absolutely need to avoid looping in standard Python to define Petsc matrix > elements due to performance issues. I also need to use a standard petscy4py > method and avoid writing new C or Fortran wrappers to minimize language > complexity. > > Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example: > > ---------------------------------------------------------------------------------------------- > comm = PETSc.COMM_WORLD > rank = comm.getRank() > > dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and > y-directions of the grid > nnz_max = 5 # max number of non-zero values per row > > A = PETSc.Mat() > A.create(comm=PETSc.COMM_WORLD) > A.setSizes((xnodes*ynodes, xnodes*ynodes)) > A.setType(PETSc.Mat.Type.AIJ) > A.setPreallocationNNZ(nnz_max) > > rstart, rend = A.getOwnershipRange() > > # Here Anz, Arow and Acol are vectors with size equal to the number of > non-zero values > Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, > nnz_max, dx, xnodes, ynodes) > > A.setValues(Arow, Acol, Anz) # <--- This does not work. > I see at least two options https://petsc.org/main/manualpages/Mat/MatCreateSeqAIJWithArrays/ or https://petsc.org/main/manualpages/Mat/MatSetPreallocationCOOLocal/ https://petsc.org/main/manualpages/Mat/MatSetValuesCOO/ Thanks, Matt > A.assemblyBegin() > A.assemblyEnd() > ksp = PETSc.KSP() > ksp.create(comm=A.getComm()) > ksp.setType(PETSc.KSP.Type.CG) > ksp.getPC().setType(PETSc.PC.Type.GAMG) > ksp.setOperators(A) > ksp.setFromOptions() > x, b = A.createVecs() > b.set(1.0) > > ksp.solve(b, x) > > Regards, > Erik Kneller, Ph.D. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 17 17:34:49 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Aug 2023 18:34:49 -0400 Subject: [petsc-users] Filling non-zero values of a Petsc matrix using numpy arrays with non-zero indices and values (petsc4py) In-Reply-To: References: <729971208.496779.1692275849292.ref@mail.yahoo.com> <729971208.496779.1692275849292@mail.yahoo.com> Message-ID: <57B51BF3-AB04-4EE3-80EF-4FB685CBA06D@petsc.dev> It appears there are currently no Python bindings for > https://petsc.org/main/manualpages/Mat/MatSetPreallocationCOOLocal/ > https://petsc.org/main/manualpages/Mat/MatSetValuesCOO/ they should be fairly easy to add in a merge request. Barry > On Aug 17, 2023, at 5:57 PM, Matthew Knepley wrote: > > On Fri, Aug 18, 2023 at 12:49?AM Erik Kneller via petsc-users > wrote: >> Hi All, >> >> I need to fill non-zero values of a Petsc matrix via petsc4py for the domain defined by A.getOwnershipRange() using three Numpy arrays: (1) array containing row indices of non-zero value, (2) array containing column indices of non-zero values and (3) array containing the non-zero matrix values. How can one perform this type of filling operation in petsc4py? The method A.setValues does not appear to allow this since it only works on an individual matrix element or a block of matrix elements. >> >> I am using Numpy arrays since they can be computed in loops optimized using Numba on each processor. I also cannot pass the Petsc matrix to a Numba compiled function since type information cannot be inferred. I absolutely need to avoid looping in standard Python to define Petsc matrix elements due to performance issues. I also need to use a standard petscy4py method and avoid writing new C or Fortran wrappers to minimize language complexity. >> >> Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example: >> ---------------------------------------------------------------------------------------------- >> comm = PETSc.COMM_WORLD >> rank = comm.getRank() >> >> dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and y-directions of the grid >> nnz_max = 5 # max number of non-zero values per row >> >> A = PETSc.Mat() >> A.create(comm=PETSc.COMM_WORLD) >> A.setSizes((xnodes*ynodes, xnodes*ynodes)) >> A.setType(PETSc.Mat.Type.AIJ) >> A.setPreallocationNNZ(nnz_max) >> >> rstart, rend = A.getOwnershipRange() >> >> # Here Anz, Arow and Acol are vectors with size equal to the number of non-zero values >> Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, nnz_max, dx, xnodes, ynodes) >> >> A.setValues(Arow, Acol, Anz) # <--- This does not work. > > I see at least two options > > https://petsc.org/main/manualpages/Mat/MatCreateSeqAIJWithArrays/ > > or > > https://petsc.org/main/manualpages/Mat/MatSetPreallocationCOOLocal/ > https://petsc.org/main/manualpages/Mat/MatSetValuesCOO/ > > Thanks, > > Matt > >> A.assemblyBegin() >> A.assemblyEnd() >> ksp = PETSc.KSP() >> ksp.create(comm=A.getComm()) >> ksp.setType(PETSc.KSP.Type.CG ) >> ksp.getPC().setType(PETSc.PC.Type.GAMG) >> ksp.setOperators(A) >> ksp.setFromOptions() >> x, b = A.createVecs() >> b.set(1.0) >> >> ksp.solve(b, x) >> >> Regards, >> Erik Kneller, Ph.D. >> > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Thu Aug 17 18:44:30 2023 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Thu, 17 Aug 2023 16:44:30 -0700 Subject: [petsc-users] PetscCall( ) in fortran Message-ID: <543086f9-79ea-da6a-1457-709fae3a7d66@berkeley.edu> Two questions about the PetscCall( ) etc. functionality in fortran: (1) To use this functionality, is it required to use a .F90 naming convention? or should I be able to use .F? (2) Is it permitted to use line continuation within these calls? For example something like ????? PetscCallMPIA(MPI_Allreduce(localval,globalsum,1,& ??????? MPIU_REAL,MPIU_SUM, PETSC_COMM_WORLD,ierr)) or is it required to just have an extra long line? or is there an alternate syntax for continuation in this case? -sanjay From bsmith at petsc.dev Thu Aug 17 18:50:46 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 17 Aug 2023 19:50:46 -0400 Subject: [petsc-users] PetscCall( ) in fortran In-Reply-To: <543086f9-79ea-da6a-1457-709fae3a7d66@berkeley.edu> References: <543086f9-79ea-da6a-1457-709fae3a7d66@berkeley.edu> Message-ID: <88879478-10A8-4FC0-A047-271B6E213557@petsc.dev> > On Aug 17, 2023, at 7:44 PM, Sanjay Govindjee wrote: > > Two questions about the PetscCall( ) etc. functionality in fortran: > > (1) To use this functionality, is it required to use a .F90 naming convention? or should I be able to use .F? This likely depends on the compiler. > > (2) Is it permitted to use line continuation within these calls? For example something like > > PetscCallMPIA(MPI_Allreduce(localval,globalsum,1,& > MPIU_REAL,MPIU_SUM, PETSC_COMM_WORLD,ierr)) > > or is it required to just have an extra long line? or is there an alternate syntax for continuation in this case? Because PetscCallXXX() is a macro, it "breaks" if a continuation is used, so yes, you will sometimes need long lines. Most Fortran compilers have an option to allow infinitely long lines. Note that you can still use CHKERRQ() either all the time or in situations where you "need" a continuation character. Barry > > -sanjay > From s_g at berkeley.edu Thu Aug 17 18:57:10 2023 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Thu, 17 Aug 2023 16:57:10 -0700 Subject: [petsc-users] PetscCall( ) in fortran In-Reply-To: <88879478-10A8-4FC0-A047-271B6E213557@petsc.dev> References: <543086f9-79ea-da6a-1457-709fae3a7d66@berkeley.edu> <88879478-10A8-4FC0-A047-271B6E213557@petsc.dev> Message-ID: Thanks. For what it is worth in regards to question (1), GNU Fortran (Homebrew GCC 11.3.0_2) 11.3.0 seems to need .F90 (as opposed to just .F). -sanjay On 8/17/23 4:50 PM, Barry Smith wrote: > >> On Aug 17, 2023, at 7:44 PM, Sanjay Govindjee wrote: >> >> Two questions about the PetscCall( ) etc. functionality in fortran: >> >> (1) To use this functionality, is it required to use a .F90 naming convention? or should I be able to use .F? > This likely depends on the compiler. >> (2) Is it permitted to use line continuation within these calls? For example something like >> >> PetscCallMPIA(MPI_Allreduce(localval,globalsum,1,& >> MPIU_REAL,MPIU_SUM, PETSC_COMM_WORLD,ierr)) >> >> or is it required to just have an extra long line? or is there an alternate syntax for continuation in this case? > Because PetscCallXXX() is a macro, it "breaks" if a continuation is used, so yes, you will sometimes need long lines. Most Fortran compilers have an option to allow infinitely long lines. > > Note that you can still use CHKERRQ() either all the time or in situations where you "need" a continuation character. > > Barry > > >> -sanjay >> From balay at mcs.anl.gov Fri Aug 18 08:59:37 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Aug 2023 08:59:37 -0500 (CDT) Subject: [petsc-users] PetscCall( ) in fortran In-Reply-To: References: <543086f9-79ea-da6a-1457-709fae3a7d66@berkeley.edu> <88879478-10A8-4FC0-A047-271B6E213557@petsc.dev> Message-ID: <976bbee0-a783-6bd0-fce8-5c577e653235@mcs.anl.gov> I think gfortran defaults to fixed form for .F and free-form for .F90 This can be changed with FFLAGS=-ffree-form - but yeah - switching the suffix might be more suitable.. In addition - PETSc configure attempts to add in "-ffree-line-length-none -ffree-line-length-0" options - so that extra long source lines can be used [with the default PETSc makefiles]. Satish On Thu, 17 Aug 2023, Sanjay Govindjee wrote: > Thanks. > > For what it is worth in regards to question (1), GNU Fortran (Homebrew GCC > 11.3.0_2) 11.3.0 seems to need .F90 (as opposed to just .F). > > -sanjay > > > On 8/17/23 4:50 PM, Barry Smith wrote: > > > >> On Aug 17, 2023, at 7:44 PM, Sanjay Govindjee wrote: > >> > >> Two questions about the PetscCall( ) etc. functionality in fortran: > >> > >> (1) To use this functionality, is it required to use a .F90 naming > >> convention? or should I be able to use .F? > > This likely depends on the compiler. > >> (2) Is it permitted to use line continuation within these calls? For > >> example something like > >> > >> PetscCallMPIA(MPI_Allreduce(localval,globalsum,1,& > >> MPIU_REAL,MPIU_SUM, PETSC_COMM_WORLD,ierr)) > >> > >> or is it required to just have an extra long line? or is there an alternate > >> syntax for continuation in this case? > > Because PetscCallXXX() is a macro, it "breaks" if a continuation is > > used, so yes, you will sometimes need long lines. Most Fortran compilers > > have an option to allow infinitely long lines. > > > > Note that you can still use CHKERRQ() either all the time or in > > situations where you "need" a continuation character. > > > > Barry > > > > > >> -sanjay > >> > From zisheng.ye at ansys.com Fri Aug 18 09:11:55 2023 From: zisheng.ye at ansys.com (Zisheng Ye) Date: Fri, 18 Aug 2023 14:11:55 +0000 Subject: [petsc-users] Configure AMGX Message-ID: Dear PETSc team I am configuring AMGX package under the main branch with CUDA 12.1. But it can't get through. Can you help to solve the problem? I have attached the configure.log to the email. Thanks, Zisheng -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 5079392 bytes Desc: configure.log URL: From balay at mcs.anl.gov Fri Aug 18 10:01:58 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 18 Aug 2023 10:01:58 -0500 (CDT) Subject: [petsc-users] Configure AMGX In-Reply-To: References: Message-ID: Can you try the update in branch "balay/amgx-cuda-12"? Satish On Fri, 18 Aug 2023, Zisheng Ye wrote: > Dear PETSc team > > I am configuring AMGX package under the main branch with CUDA 12.1. But it can't get through. Can you help to solve the problem? I have attached the configure.log to the email. > > Thanks, > Zisheng > From zisheng.ye at ansys.com Fri Aug 18 11:00:29 2023 From: zisheng.ye at ansys.com (Zisheng Ye) Date: Fri, 18 Aug 2023 16:00:29 +0000 Subject: [petsc-users] Configure AMGX In-Reply-To: References: Message-ID: Thanks, that branch works. ________________________________ From: Satish Balay Sent: Friday, August 18, 2023 11:01 AM To: Zisheng Ye Cc: PETSc users list Subject: Re: [petsc-users] Configure AMGX [External Sender] Can you try the update in branch "balay/amgx-cuda-12"? Satish On Fri, 18 Aug 2023, Zisheng Ye wrote: > Dear PETSc team > > I am configuring AMGX package under the main branch with CUDA 12.1. But it can't get through. Can you help to solve the problem? I have attached the configure.log to the email. > > Thanks, > Zisheng > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Fri Aug 18 11:24:32 2023 From: hongzhang at anl.gov (Zhang, Hong) Date: Fri, 18 Aug 2023 16:24:32 +0000 Subject: [petsc-users] Filling non-zero values of a Petsc matrix using numpy arrays with non-zero indices and values (petsc4py) In-Reply-To: <729971208.496779.1692275849292@mail.yahoo.com> References: <729971208.496779.1692275849292.ref@mail.yahoo.com> <729971208.496779.1692275849292@mail.yahoo.com> Message-ID: You can use this to build a PETSc matrix with the index arrays ai,aj and the value array aa: PETSc.Mat().createAIJ(size=(nrows,ncols), csr=(ai,aj,aa)) Hong (Mr.) > On Aug 17, 2023, at 7:37 AM, Erik Kneller via petsc-users wrote: > > Hi All, > > I need to fill non-zero values of a Petsc matrix via petsc4py for the domain defined by A.getOwnershipRange() using three Numpy arrays: (1) array containing row indices of non-zero value, (2) array containing column indices of non-zero values and (3) array containing the non-zero matrix values. How can one perform this type of filling operation in petsc4py? The method A.setValues does not appear to allow this since it only works on an individual matrix element or a block of matrix elements. > > I am using Numpy arrays since they can be computed in loops optimized using Numba on each processor. I also cannot pass the Petsc matrix to a Numba compiled function since type information cannot be inferred. I absolutely need to avoid looping in standard Python to define Petsc matrix elements due to performance issues. I also need to use a standard petscy4py method and avoid writing new C or Fortran wrappers to minimize language complexity. > > Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example: > ---------------------------------------------------------------------------------------------- > comm = PETSc.COMM_WORLD > rank = comm.getRank() > > dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and y-directions of the grid > nnz_max = 5 # max number of non-zero values per row > > A = PETSc.Mat() > A.create(comm=PETSc.COMM_WORLD) > A.setSizes((xnodes*ynodes, xnodes*ynodes)) > A.setType(PETSc.Mat.Type.AIJ) > A.setPreallocationNNZ(nnz_max) > > rstart, rend = A.getOwnershipRange() > > # Here Anz, Arow and Acol are vectors with size equal to the number of non-zero values > Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, nnz_max, dx, xnodes, ynodes) > > A.setValues(Arow, Acol, Anz) # <--- This does not work. > > A.assemblyBegin() > A.assemblyEnd() > ksp = PETSc.KSP() > ksp.create(comm=A.getComm()) > ksp.setType(PETSc.KSP.Type.CG) > ksp.getPC().setType(PETSc.PC.Type.GAMG) > ksp.setOperators(A) > ksp.setFromOptions() > x, b = A.createVecs() > b.set(1.0) > > ksp.solve(b, x) > > Regards, > Erik Kneller, Ph.D. > From erik.kneller at yahoo.com Sat Aug 19 09:51:56 2023 From: erik.kneller at yahoo.com (Erik Kneller) Date: Sat, 19 Aug 2023 14:51:56 +0000 (UTC) Subject: [petsc-users] Filling non-zero values of a Petsc matrix using numpy arrays with non-zero indices and values (petsc4py) In-Reply-To: References: <729971208.496779.1692275849292.ref@mail.yahoo.com> <729971208.496779.1692275849292@mail.yahoo.com> Message-ID: <1477843384.1043145.1692456716512@mail.yahoo.com> Hi All, Thank you for the recommendations. The first option provided by Matthew works in petsc4py but required some additional computations to re-organize information. I developed an approach that was easier to build within my current system using the option provided by Hong along with some other posts I found on the user forum. See below for an example of how I integrated Numba and petcs4py so the filling of matrix elements on each processor is done using the optimized machine code produced by Numba (you can eliminate the cleaning step if the number of non-zero elements for each row is well defined). Scaling and overall performance is now satisfactory. I do have another question. In order to make this work I create the Petsc matrix 'A' twice, first to get local ownership and second to define the local elements for a particular processor: Algorithm------------ (1) Create a Petsc matrix 'A' and set size and type(2) Get the local row start and end for matrix 'A'(3) Define the local non-zero coefficients for the rows owned by processor using a Numba JIT-compiled loop and store result in a csr matrix defined using Scipy. (4) Redefine Petsc matix 'A' using the the local csr elements define in step 3.(5) Begin and end assembly(6) Define RHS and initial solution vectors(7) Solve system ?(see code example below for details) Steps (1) and (4) appear redundant and potentially sub-optimal from a performance perspective (perhaps due to my limited experience with petscy4py). Is there a better approach in terms of elegance and performance? Also, if this is the optimal syntax could someone elaborate on what exactly is occurring behind the seen in terms of memory allocation? Thank you all again for your guidance and insights. Modified Version of Dalcin's 2D Poisson Example with Numba ---------------------------------------------------------------------------------- import sys from typing import Tuple import numpy.typing as npt import numpy as np from numba import njit import petsc4py petsc4py.init(sys.argv) from petsc4py import PETSc import scipy.sparse as sps def main() -> None: ??? xnodes = 8000 ??? nnz_max = 10 ?? ? ??? ynodes = xnodes ??? N = xnodes*ynodes ??? dx = 1.0/(xnodes + 1) ?? ? ??? A = PETSc.Mat() ??? A.create(comm=PETSc.COMM_WORLD) ??? A.setSizes((xnodes*ynodes, xnodes*ynodes)) ??? A.setType(PETSc.Mat.Type.AIJ) ?? ? ??? rstart, rend = A.getOwnershipRange() ??? Ascipy = build_csr_matrix( ??????? rstart, rend, nnz_max, dx, xnodes, ynodes) ??? csr=( ??????? Ascipy.indptr[rstart:rend+1] - Ascipy.indptr[rstart], ??????? Ascipy.indices[Ascipy.indptr[rstart]:Ascipy.indptr[rend]], ??????? Ascipy.data[Ascipy.indptr[rstart]:Ascipy.indptr[rend]] ??????? ) ??? A = PETSc.Mat().createAIJ(size=(N,N), csr=csr)??? ??? A.assemblyBegin() ??? A.assemblyEnd() ?? ? ??? ksp = PETSc.KSP() ??? ksp.create(comm=A.getComm()) ??? ksp.setType(PETSc.KSP.Type.CG) ??? ksp.getPC().setType(PETSc.PC.Type.GAMG) ?? ? ??? ksp.setOperators(A) ??? ksp.setFromOptions() ?? ? ??? x, b = A.createVecs() ??? b.set(1.0) ?? ? ??? ksp.solve(b, x) ?? ? ?? ? def build_csr_matrix( ??????? rstart:int, ??????? rend:int, ??????? nnz_max:int, ??????? dx:float, ??????? xnodes:int, ??????? ynodes:int ): ??? Anz, Arow, Acol = build_nonzero_arrays( ??????? rstart, rend, nnz_max, dx, xnodes, ynodes ??????? ) ??? N = xnodes*ynodes ??? Ls = sps.csr_matrix((Anz, (Arow,Acol)), shape=(N,N), dtype=np.float64) ??? return Ls ? ? def build_nonzero_arrays( ??????? rstart:int, ??????? rend:int, ??????? nnz_max:int, ??????? dx:float, ??????? xnodes:int, ??????? ynodes:int ) -> Tuple[ ??? npt.NDArray[np.float64], ??? npt.NDArray[np.float64], ??? npt.NDArray[np.float64] ??? ]: ??? nrows_local = (rend - rstart) + 1 ??? Anz_ini = np.zeros((nnz_max*nrows_local), dtype=np.float64) ??? Arow_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32) ??? Acol_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32) ??? icount_nz = define_nonzero_values( ??????? rstart, rend, dx, xnodes, ynodes, Anz_ini, Arow_ini, Acol_ini ??????? ) ??? ( ??????? Anz, Arow, Acol ??? ) = clean_nonzero_arrays(icount_nz, Anz_ini, Arow_ini, Acol_ini) ??? return Anz, Arow, Acol @njit def define_nonzero_values( ??????? rstart:int, ??????? rend:int, ??????? dx:float, ??????? xnodes:int, ??????? ynodes:int, ??????? Anz:npt.NDArray[np.float64], ??????? Arow:npt.NDArray[np.int64], ??????? Acol:npt.NDArray[np.int64] ) -> int: ??? """ Fill matrix A ??? """ ??? icount_nz = 0 ??? for row in range(rstart, rend): ??????? i, j = index_to_grid(row, xnodes) ??????? #A[row, row] = 4.0/dx**2 ??????? Anz[icount_nz] = 4.0/dx**2 ??????? Arow[icount_nz] = row ??????? Acol[icount_nz] = row ??????? icount_nz += 1 ??????? if i > 0: ??????????? column = row - xnodes ??????????? #A[row, column] = -1.0/dx**2 ??????????? Anz[icount_nz] = -1.0/dx**2 ??????????? Arow[icount_nz] = row ??????????? Acol[icount_nz] = column ??????????? icount_nz += 1 ??????? if i < xnodes - 1: ??????????? column = row + xnodes ??????????? #A[row, column] = -1.0/dx**2 ??????????? Anz[icount_nz] = -1.0/dx**2 ??????????? Arow[icount_nz] = row ??????????? Acol[icount_nz] = column ??????????? icount_nz += 1 ??????? if j > 0: ??????????? column = row - 1 ??????????? #A[row, column] = -1.0/dx**2 ??????????? Anz[icount_nz] = -1.0/dx**2 ??????????? Arow[icount_nz] = row ??????????? Acol[icount_nz] = column ??????????? icount_nz += 1 ??????? if j < xnodes - 1: ??????????? column = row + 1 ??????????? #A[row, column] = -1.0/dx**2 ??????????? Anz[icount_nz] = -1.0/dx**2 ??????????? Arow[icount_nz] = row ??????????? Acol[icount_nz] = column ??????????? icount_nz += 1 ??? return icount_nz @njit def clean_nonzero_arrays( ??????? icount_nz:int, ??????? Anz_ini:npt.NDArray[np.float64], ??????? Arow_ini:npt.NDArray[np.float64], ??????? Acol_ini:npt.NDArray[np.float64] ) -> Tuple[ ??? npt.NDArray[np.float64], ??? npt.NDArray[np.float64], ??? npt.NDArray[np.float64] ??? ]: ??? Anz = np.zeros((icount_nz), dtype=np.float64) ??? Arow = np.zeros((icount_nz), dtype=np.int32) ??? Acol = np.zeros((icount_nz), dtype=np.int32) ??? for i in range(icount_nz): ??????? Anz[i] = Anz_ini[i] ??????? Arow[i] = Arow_ini[i] ??????? Acol[i] = Acol_ini[i] ??? return Anz, Arow, Acol @njit def index_to_grid(r:int, n:int) -> Tuple[int,int]: ??? """Convert a row number into a grid point.""" ??? return (r//n, r%n) if __name__ == "__main__": ??? main() On Friday, August 18, 2023 at 11:24:36 AM CDT, Zhang, Hong wrote: You can use this to build a PETSc matrix with the index arrays ai,aj and the value array aa: ? ? PETSc.Mat().createAIJ(size=(nrows,ncols), csr=(ai,aj,aa)) Hong (Mr.) > On Aug 17, 2023, at 7:37 AM, Erik Kneller via petsc-users wrote: > > Hi All, > > I need to fill non-zero values of a Petsc matrix via petsc4py for the domain defined by A.getOwnershipRange() using three Numpy arrays: (1) array containing row indices of non-zero value, (2) array containing column indices of non-zero values and (3) array containing the non-zero matrix values. How can one perform this type of filling operation in petsc4py? The method A.setValues does not appear to allow this since it only works on an individual matrix element or a block of matrix elements. > > I am using Numpy arrays since they can be computed in loops optimized using Numba on each processor. I also cannot pass the Petsc matrix to a Numba compiled function since type information cannot be inferred. I absolutely need to avoid looping in standard Python to define Petsc matrix elements due to performance issues. I also need to use a standard petscy4py method and avoid writing new C or Fortran wrappers to minimize language complexity. > > Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example: > ---------------------------------------------------------------------------------------------- > comm = PETSc.COMM_WORLD > rank = comm.getRank() > > dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and y-directions of the grid > nnz_max = 5 # max number of non-zero values per row > > A = PETSc.Mat() > A.create(comm=PETSc.COMM_WORLD) > A.setSizes((xnodes*ynodes, xnodes*ynodes)) > A.setType(PETSc.Mat.Type.AIJ) > A.setPreallocationNNZ(nnz_max) > > rstart, rend = A.getOwnershipRange() > > # Here Anz, Arow and Acol are vectors with size equal to the number of non-zero values > Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, nnz_max, dx, xnodes, ynodes) > > A.setValues(Arow, Acol, Anz) # <--- This does not work. > > A.assemblyBegin() > A.assemblyEnd() > ksp = PETSc.KSP() > ksp.create(comm=A.getComm()) > ksp.setType(PETSc.KSP.Type.CG) > ksp.getPC().setType(PETSc.PC.Type.GAMG) > ksp.setOperators(A) > ksp.setFromOptions() > x, b = A.createVecs() > b.set(1.0) > > ksp.solve(b, x) > > Regards, > Erik Kneller, Ph.D. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 19 13:28:10 2023 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 19 Aug 2023 14:28:10 -0400 Subject: [petsc-users] Filling non-zero values of a Petsc matrix using numpy arrays with non-zero indices and values (petsc4py) In-Reply-To: <1477843384.1043145.1692456716512@mail.yahoo.com> References: <729971208.496779.1692275849292.ref@mail.yahoo.com> <729971208.496779.1692275849292@mail.yahoo.com> <1477843384.1043145.1692456716512@mail.yahoo.com> Message-ID: <21A29B72-F664-42DB-AD40-A0F6989556E7@petsc.dev> The cost of the initial matrix creation to get the row starts and ends is trivial compared to the later computations, so is fine to retain. > On Aug 19, 2023, at 10:51 AM, Erik Kneller via petsc-users wrote: > > Hi All, > > Thank you for the recommendations. The first option provided by Matthew works in petsc4py but required some additional computations to re-organize information. I developed an approach that was easier to build within my current system using the option provided by Hong along with some other posts I found on the user forum. See below for an example of how I integrated Numba and petcs4py so the filling of matrix elements on each processor is done using the optimized machine code produced by Numba (you can eliminate the cleaning step if the number of non-zero elements for each row is well defined). Scaling and overall performance is now satisfactory. > > I do have another question. In order to make this work I create the Petsc matrix 'A' twice, first to get local ownership and second to define the local elements for a particular processor: > > Algorithm > ------------ > (1) Create a Petsc matrix 'A' and set size and type > (2) Get the local row start and end for matrix 'A' > (3) Define the local non-zero coefficients for the rows owned by processor using a Numba JIT-compiled loop and store result in a csr matrix defined using Scipy. > (4) Redefine Petsc matix 'A' using the the local csr elements define in step 3. > (5) Begin and end assembly > (6) Define RHS and initial solution vectors > (7) Solve system > (see code example below for details) > > Steps (1) and (4) appear redundant and potentially sub-optimal from a performance perspective (perhaps due to my limited experience with petscy4py). Is there a better approach in terms of elegance and performance? Also, if this is the optimal syntax could someone elaborate on what exactly is occurring behind the seen in terms of memory allocation? > > Thank you all again for your guidance and insights. > > Modified Version of Dalcin's 2D Poisson Example with Numba > ---------------------------------------------------------------------------------- > import sys > from typing import Tuple > import numpy.typing as npt > import numpy as np > from numba import njit > import petsc4py > petsc4py.init(sys.argv) > from petsc4py import PETSc > import scipy.sparse as sps > > > def main() -> None: > xnodes = 8000 > nnz_max = 10 > > ynodes = xnodes > N = xnodes*ynodes > dx = 1.0/(xnodes + 1) > > A = PETSc.Mat() > A.create(comm=PETSc.COMM_WORLD) > A.setSizes((xnodes*ynodes, xnodes*ynodes)) > A.setType(PETSc.Mat.Type.AIJ) > > rstart, rend = A.getOwnershipRange() > Ascipy = build_csr_matrix( > rstart, rend, nnz_max, dx, xnodes, ynodes) > csr=( > Ascipy.indptr[rstart:rend+1] - Ascipy.indptr[rstart], > Ascipy.indices[Ascipy.indptr[rstart]:Ascipy.indptr[rend]], > Ascipy.data[Ascipy.indptr[rstart]:Ascipy.indptr[rend]] > ) > A = PETSc.Mat().createAIJ(size=(N,N), csr=csr) > > A.assemblyBegin() > A.assemblyEnd() > > ksp = PETSc.KSP() > ksp.create(comm=A.getComm()) > ksp.setType(PETSc.KSP.Type.CG) > ksp.getPC().setType(PETSc.PC.Type.GAMG) > > ksp.setOperators(A) > ksp.setFromOptions() > > x, b = A.createVecs() > b.set(1.0) > > ksp.solve(b, x) > > > def build_csr_matrix( > rstart:int, > rend:int, > nnz_max:int, > dx:float, > xnodes:int, > ynodes:int > ): > Anz, Arow, Acol = build_nonzero_arrays( > rstart, rend, nnz_max, dx, xnodes, ynodes > ) > N = xnodes*ynodes > Ls = sps.csr_matrix((Anz, (Arow,Acol)), shape=(N,N), dtype=np.float64) > return Ls > > > def build_nonzero_arrays( > rstart:int, > rend:int, > nnz_max:int, > dx:float, > xnodes:int, > ynodes:int > ) -> Tuple[ > npt.NDArray[np.float64], > npt.NDArray[np.float64], > npt.NDArray[np.float64] > ]: > nrows_local = (rend - rstart) + 1 > Anz_ini = np.zeros((nnz_max*nrows_local), dtype=np.float64) > Arow_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32) > Acol_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32) > icount_nz = define_nonzero_values( > rstart, rend, dx, xnodes, ynodes, Anz_ini, Arow_ini, Acol_ini > ) > ( > Anz, Arow, Acol > ) = clean_nonzero_arrays(icount_nz, Anz_ini, Arow_ini, Acol_ini) > return Anz, Arow, Acol > > > @njit > def define_nonzero_values( > rstart:int, > rend:int, > dx:float, > xnodes:int, > ynodes:int, > Anz:npt.NDArray[np.float64], > Arow:npt.NDArray[np.int64], > Acol:npt.NDArray[np.int64] > ) -> int: > """ Fill matrix A > """ > icount_nz = 0 > for row in range(rstart, rend): > i, j = index_to_grid(row, xnodes) > #A[row, row] = 4.0/dx**2 > Anz[icount_nz] = 4.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = row > icount_nz += 1 > if i > 0: > column = row - xnodes > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > if i < xnodes - 1: > column = row + xnodes > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > if j > 0: > column = row - 1 > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > if j < xnodes - 1: > column = row + 1 > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > return icount_nz > > > @njit > def clean_nonzero_arrays( > icount_nz:int, > Anz_ini:npt.NDArray[np.float64], > Arow_ini:npt.NDArray[np.float64], > Acol_ini:npt.NDArray[np.float64] > ) -> Tuple[ > npt.NDArray[np.float64], > npt.NDArray[np.float64], > npt.NDArray[np.float64] > ]: > Anz = np.zeros((icount_nz), dtype=np.float64) > Arow = np.zeros((icount_nz), dtype=np.int32) > Acol = np.zeros((icount_nz), dtype=np.int32) > for i in range(icount_nz): > Anz[i] = Anz_ini[i] > Arow[i] = Arow_ini[i] > Acol[i] = Acol_ini[i] > return Anz, Arow, Acol > > > @njit > def index_to_grid(r:int, n:int) -> Tuple[int,int]: > """Convert a row number into a grid point.""" > return (r//n, r%n) > > > if __name__ == "__main__": > main() > > > On Friday, August 18, 2023 at 11:24:36 AM CDT, Zhang, Hong wrote: > > > You can use this to build a PETSc matrix with the index arrays ai,aj and the value array aa: > PETSc.Mat().createAIJ(size=(nrows,ncols), csr=(ai,aj,aa)) > > Hong (Mr.) > > > On Aug 17, 2023, at 7:37 AM, Erik Kneller via petsc-users > wrote: > > > > Hi All, > > > > I need to fill non-zero values of a Petsc matrix via petsc4py for the domain defined by A.getOwnershipRange() using three Numpy arrays: (1) array containing row indices of non-zero value, (2) array containing column indices of non-zero values and (3) array containing the non-zero matrix values. How can one perform this type of filling operation in petsc4py? The method A.setValues does not appear to allow this since it only works on an individual matrix element or a block of matrix elements. > > > > I am using Numpy arrays since they can be computed in loops optimized using Numba on each processor. I also cannot pass the Petsc matrix to a Numba compiled function since type information cannot be inferred. I absolutely need to avoid looping in standard Python to define Petsc matrix elements due to performance issues. I also need to use a standard petscy4py method and avoid writing new C or Fortran wrappers to minimize language complexity. > > > > Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example: > > ---------------------------------------------------------------------------------------------- > > comm = PETSc.COMM_WORLD > > rank = comm.getRank() > > > > dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and y-directions of the grid > > nnz_max = 5 # max number of non-zero values per row > > > > A = PETSc.Mat() > > A.create(comm=PETSc.COMM_WORLD) > > A.setSizes((xnodes*ynodes, xnodes*ynodes)) > > A.setType(PETSc.Mat.Type.AIJ) > > A.setPreallocationNNZ(nnz_max) > > > > rstart, rend = A.getOwnershipRange() > > > > # Here Anz, Arow and Acol are vectors with size equal to the number of non-zero values > > Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, nnz_max, dx, xnodes, ynodes) > > > > A.setValues(Arow, Acol, Anz) # <--- This does not work. > > > > A.assemblyBegin() > > A.assemblyEnd() > > ksp = PETSc.KSP() > > ksp.create(comm=A.getComm()) > > ksp.setType(PETSc.KSP.Type.CG) > > ksp.getPC().setType(PETSc.PC.Type.GAMG) > > ksp.setOperators(A) > > ksp.setFromOptions() > > x, b = A.createVecs() > > b.set(1.0) > > > > ksp.solve(b, x) > > > > Regards, > > Erik Kneller, Ph.D. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Sat Aug 19 15:47:58 2023 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Sat, 19 Aug 2023 13:47:58 -0700 Subject: [petsc-users] MetSetValues not found Message-ID: I recently got a copy of E.Bueler's petsc book and was converting the programs to fortran for my students and have run into a compile error that I can not figure out.? I comment out the MatSetValues line the code compiles fine. When I run make on vecmatkspf.F90, I get the error: ?? 42 | PetscCallA(MatSetValues(A,1,i,4,j,aA,INSERT_VALUES,ierr)) ????? | 1 Error: There is no specific subroutine for the generic 'matsetvalues' at (1) My petsc set works, runs with my FEA code and compiles and runs the fortran programs in the petsc tutorials directories just fine.? And in particular, compiles and runs the C versions from Bueler. I've attached the make file and the source.? I must be doing something obviously wrong but I can not spot it. -sanjay -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ! Fortran version of vecmatksp.c program vecmatkspf #include use petscksp use petscmat Vec :: x, b Mat :: A KSP :: ksp PetscInt :: i, j(4) PetscReal :: ab(4),aA(4,4) PetscErrorCode :: ierr data j /0,1,2,3/ data ab /7.0, 1.0, 1.0, 3.0/ data aA /1.0, 2.0, -1.0, 0.0, & 2.0, 1.0, 1.0, 1.0, & 3.0, -2.0, 1.0, 1.0, & 0.0, -3.0, 0.0, -1.0/ PetscCallA(PetscInitialize(PETSC_NULL_CHARACTER,"Solve a 4x4 linear system using KSP.\n",ierr)) PetscCallA(VecCreate(PETSC_COMM_WORLD,b,ierr)) PetscCallA(VecSetSizes(b,PETSC_DECIDE,4,ierr)) PetscCallA(VecSetFromOptions(b,ierr)) PetscCallA(VecSetValues(b,4,j,ab,INSERT_VALUES,ierr)) PetscCallA(VecAssemblyBegin(b,ierr)) PetscCallA(VecAssemblyEnd(b,ierr)) PetscCallA(MatCreate(PETSC_COMM_WORLD,A,ierr)) PetscCallA(MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,4,4,ierr)) PetscCallA(MatSetFromOptions(A,ierr)) PetscCallA(MatSetUp(A,ierr)) do i = 0,3 PetscCallA(MatSetValues(A,1,i,4,j,aA,INSERT_VALUES,ierr)) end do PetscCallA(MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)) PetscCallA(MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)) PetscCallA(KSPCreate(PETSC_COMM_WORLD,ksp,ierr)) PetscCallA(KSPSetOperators(ksp,A,A,ierr)) PetscCallA(KSPSetFromOptions(ksp,ierr)) PetscCallA(VecDuplicate(b,x,ierr)) PetscCallA(KSPSolve(ksp,b,x,ierr)) PetscCallA(VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)) PetscCallA(KSPDestroy(ksp,ierr)) PetscCallA(MatDestroy(A,ierr)) PetscCallA(VecDestroy(x,ierr)) PetscCallA(VecDestroy(b,ierr)) PetscCallA(PetscFinalize(ierr)) end program vecmatkspf -------------- next part -------------- include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules CFLAGS += -pedantic -std=c99 sparsemat: sparsemat.o -${CLINKER} -o sparsemat sparsemat.o ${PETSC_LIB} ${RM} sparsemat.o vecmatksp: vecmatksp.o -${CLINKER} -o vecmatksp vecmatksp.o ${PETSC_LIB} ${RM} vecmatksp.o vecmatkspf: vecmatkspf.o -${FLINKER} -o vecmatkspf vecmatkspf.o ${PETSC_FORTRAN_LIB} ${PETSC_LIB} ${RM} vecmatkspf.o tri: tri.o -${CLINKER} -o tri tri.o ${PETSC_LIB} ${RM} tri.o loadsolve: loadsolve.o -${CLINKER} -o loadsolve loadsolve.o ${PETSC_LIB} ${RM} loadsolve.o # testing runsparsemat_1: - at ../testit.sh sparsemat "-mat_view" 1 1 runvecmatksp_1: - at ../testit.sh vecmatksp "" 1 1 runtri_1: - at ../testit.sh tri "-a_mat_view ::ascii_dense" 1 1 runtri_2: - at ../testit.sh tri "-tri_m 1000 -ksp_rtol 1.0e-4 -ksp_type cg -pc_type bjacobi -sub_pc_type jacobi -ksp_converged_reason" 2 2 runloadsolve_1: - at ./tri -ksp_view_mat binary:A.dat -ksp_view_rhs binary:b.dat > /dev/null - at ../testit.sh loadsolve "-verbose -fA A.dat -fb b.dat -ksp_view_mat -ksp_view_rhs -ksp_view_solution" 1 1 test_sparsemat: runsparsemat_1 test_vecmatksp: runvecmatksp_1 test_tri: runtri_1 runtri_2 test_loadsolve: runloadsolve_1 test: test_sparsemat test_vecmatksp test_tri test_loadsolve # etc .PHONY: clean distclean runvecmatksp_1 runtri_1 runtri_2 runloadsolve_1 test test_vecmatksp test_tri test_loadsolve distclean: clean clean:: @rm -f *~ sparsemat vecmatksp tri loadsolve *tmp @rm -f *.dat *.dat.info From knepley at gmail.com Sat Aug 19 16:55:44 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 20 Aug 2023 07:55:44 +1000 Subject: [petsc-users] Filling non-zero values of a Petsc matrix using numpy arrays with non-zero indices and values (petsc4py) In-Reply-To: <1477843384.1043145.1692456716512@mail.yahoo.com> References: <729971208.496779.1692275849292.ref@mail.yahoo.com> <729971208.496779.1692275849292@mail.yahoo.com> <1477843384.1043145.1692456716512@mail.yahoo.com> Message-ID: On Sun, Aug 20, 2023 at 12:53?AM Erik Kneller via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi All, > > Thank you for the recommendations. The first option provided by Matthew > works in petsc4py but required some additional computations to re-organize > information. I developed an approach that was easier to build within my > current system using the option provided by Hong along with some other > posts I found on the user forum. See below for an example of how I > integrated Numba and petcs4py so the filling of matrix elements on each > processor is done using the optimized machine code produced by Numba (you > can eliminate the cleaning step if the number of non-zero elements for each > row is well defined). Scaling and overall performance is now satisfactory. > > I do have another question. In order to make this work I create the Petsc > matrix 'A' twice, first to get local ownership and second to define the > local elements for a particular processor: > > Algorithm > ------------ > (1) Create a Petsc matrix 'A' and set size and type > (2) Get the local row start and end for matrix 'A' > (3) Define the local non-zero coefficients for the rows owned by processor > using a Numba JIT-compiled loop and store result in a csr matrix defined > using Scipy. > (4) Redefine Petsc matix 'A' using the the local csr elements define in > step 3. > (5) Begin and end assembly > (6) Define RHS and initial solution vectors > (7) Solve system > (see code example below for details) > > Steps (1) and (4) appear redundant and potentially sub-optimal from a > performance perspective (perhaps due to my limited experience with > petscy4py). Is there a better approach in terms of elegance and > performance? > In 1), if you just want the default ownership, you can call https://petsc.org/main/manualpages/Sys/PetscSplitOwnership/ which is what the Mat is calling underneath. Thanks, Matt > Also, if this is the optimal syntax could someone elaborate on what > exactly is occurring behind the seen in terms of memory allocation? > > Thank you all again for your guidance and insights. > > Modified Version of Dalcin's 2D Poisson Example with Numba > > ---------------------------------------------------------------------------------- > import sys > from typing import Tuple > import numpy.typing as npt > import numpy as np > from numba import njit > import petsc4py > petsc4py.init(sys.argv) > from petsc4py import PETSc > import scipy.sparse as sps > > > def main() -> None: > xnodes = 8000 > nnz_max = 10 > > ynodes = xnodes > N = xnodes*ynodes > dx = 1.0/(xnodes + 1) > > A = PETSc.Mat() > A.create(comm=PETSc.COMM_WORLD) > A.setSizes((xnodes*ynodes, xnodes*ynodes)) > A.setType(PETSc.Mat.Type.AIJ) > > rstart, rend = A.getOwnershipRange() > Ascipy = build_csr_matrix( > rstart, rend, nnz_max, dx, xnodes, ynodes) > csr=( > Ascipy.indptr[rstart:rend+1] - Ascipy.indptr[rstart], > Ascipy.indices[Ascipy.indptr[rstart]:Ascipy.indptr[rend]], > Ascipy.data[Ascipy.indptr[rstart]:Ascipy.indptr[rend]] > ) > A = PETSc.Mat().createAIJ(size=(N,N), csr=csr) > > A.assemblyBegin() > A.assemblyEnd() > > ksp = PETSc.KSP() > ksp.create(comm=A.getComm()) > ksp.setType(PETSc.KSP.Type.CG) > ksp.getPC().setType(PETSc.PC.Type.GAMG) > > ksp.setOperators(A) > ksp.setFromOptions() > > x, b = A.createVecs() > b.set(1.0) > > ksp.solve(b, x) > > > def build_csr_matrix( > rstart:int, > rend:int, > nnz_max:int, > dx:float, > xnodes:int, > ynodes:int > ): > Anz, Arow, Acol = build_nonzero_arrays( > rstart, rend, nnz_max, dx, xnodes, ynodes > ) > N = xnodes*ynodes > Ls = sps.csr_matrix((Anz, (Arow,Acol)), shape=(N,N), dtype=np.float64) > return Ls > > > def build_nonzero_arrays( > rstart:int, > rend:int, > nnz_max:int, > dx:float, > xnodes:int, > ynodes:int > ) -> Tuple[ > npt.NDArray[np.float64], > npt.NDArray[np.float64], > npt.NDArray[np.float64] > ]: > nrows_local = (rend - rstart) + 1 > Anz_ini = np.zeros((nnz_max*nrows_local), dtype=np.float64) > Arow_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32) > Acol_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32) > icount_nz = define_nonzero_values( > rstart, rend, dx, xnodes, ynodes, Anz_ini, Arow_ini, Acol_ini > ) > ( > Anz, Arow, Acol > ) = clean_nonzero_arrays(icount_nz, Anz_ini, Arow_ini, Acol_ini) > return Anz, Arow, Acol > > > @njit > def define_nonzero_values( > rstart:int, > rend:int, > dx:float, > xnodes:int, > ynodes:int, > Anz:npt.NDArray[np.float64], > Arow:npt.NDArray[np.int64], > Acol:npt.NDArray[np.int64] > ) -> int: > """ Fill matrix A > """ > icount_nz = 0 > for row in range(rstart, rend): > i, j = index_to_grid(row, xnodes) > #A[row, row] = 4.0/dx**2 > Anz[icount_nz] = 4.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = row > icount_nz += 1 > if i > 0: > column = row - xnodes > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > if i < xnodes - 1: > column = row + xnodes > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > if j > 0: > column = row - 1 > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > if j < xnodes - 1: > column = row + 1 > #A[row, column] = -1.0/dx**2 > Anz[icount_nz] = -1.0/dx**2 > Arow[icount_nz] = row > Acol[icount_nz] = column > icount_nz += 1 > return icount_nz > > > @njit > def clean_nonzero_arrays( > icount_nz:int, > Anz_ini:npt.NDArray[np.float64], > Arow_ini:npt.NDArray[np.float64], > Acol_ini:npt.NDArray[np.float64] > ) -> Tuple[ > npt.NDArray[np.float64], > npt.NDArray[np.float64], > npt.NDArray[np.float64] > ]: > Anz = np.zeros((icount_nz), dtype=np.float64) > Arow = np.zeros((icount_nz), dtype=np.int32) > Acol = np.zeros((icount_nz), dtype=np.int32) > for i in range(icount_nz): > Anz[i] = Anz_ini[i] > Arow[i] = Arow_ini[i] > Acol[i] = Acol_ini[i] > return Anz, Arow, Acol > > > @njit > def index_to_grid(r:int, n:int) -> Tuple[int,int]: > """Convert a row number into a grid point.""" > return (r//n, r%n) > > > if __name__ == "__main__": > main() > > > On Friday, August 18, 2023 at 11:24:36 AM CDT, Zhang, Hong < > hongzhang at anl.gov> wrote: > > > You can use this to build a PETSc matrix with the index arrays ai,aj and > the value array aa: > PETSc.Mat().createAIJ(size=(nrows,ncols), csr=(ai,aj,aa)) > > Hong (Mr.) > > > On Aug 17, 2023, at 7:37 AM, Erik Kneller via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hi All, > > > > I need to fill non-zero values of a Petsc matrix via petsc4py for the > domain defined by A.getOwnershipRange() using three Numpy arrays: (1) array > containing row indices of non-zero value, (2) array containing column > indices of non-zero values and (3) array containing the non-zero matrix > values. How can one perform this type of filling operation in petsc4py? The > method A.setValues does not appear to allow this since it only works on an > individual matrix element or a block of matrix elements. > > > > I am using Numpy arrays since they can be computed in loops optimized > using Numba on each processor. I also cannot pass the Petsc matrix to a > Numba compiled function since type information cannot be inferred. I > absolutely need to avoid looping in standard Python to define Petsc matrix > elements due to performance issues. I also need to use a standard petscy4py > method and avoid writing new C or Fortran wrappers to minimize language > complexity. > > > > Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example: > > > ---------------------------------------------------------------------------------------------- > > comm = PETSc.COMM_WORLD > > rank = comm.getRank() > > > > dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and > y-directions of the grid > > nnz_max = 5 # max number of non-zero values per row > > > > A = PETSc.Mat() > > A.create(comm=PETSc.COMM_WORLD) > > A.setSizes((xnodes*ynodes, xnodes*ynodes)) > > A.setType(PETSc.Mat.Type.AIJ) > > A.setPreallocationNNZ(nnz_max) > > > > rstart, rend = A.getOwnershipRange() > > > > # Here Anz, Arow and Acol are vectors with size equal to the number of > non-zero values > > Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, > nnz_max, dx, xnodes, ynodes) > > > > A.setValues(Arow, Acol, Anz) # <--- This does not work. > > > > A.assemblyBegin() > > A.assemblyEnd() > > ksp = PETSc.KSP() > > ksp.create(comm=A.getComm()) > > ksp.setType(PETSc.KSP.Type.CG) > > ksp.getPC().setType(PETSc.PC.Type.GAMG) > > ksp.setOperators(A) > > ksp.setFromOptions() > > x, b = A.createVecs() > > b.set(1.0) > > > > ksp.solve(b, x) > > > > Regards, > > Erik Kneller, Ph.D. > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Aug 19 16:57:28 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 20 Aug 2023 07:57:28 +1000 Subject: [petsc-users] MetSetValues not found In-Reply-To: References: Message-ID: On Sun, Aug 20, 2023 at 6:48?AM Sanjay Govindjee wrote: > I recently got a copy of E.Bueler's petsc book and was converting the > programs to fortran for my students and have run into a compile error that > I can not figure out. I comment out the MatSetValues line the code > compiles fine. > > When I run make on vecmatkspf.F90, I get the error: > > 42 | PetscCallA(MatSetValues(A,1,i,4,j,aA,INSERT_VALUES,ierr)) > | 1 > Error: There is no specific subroutine for the generic 'matsetvalues' at > (1) > > Your 'i' is a PetscInt, whereas it should be an array like 'j'. Thanks, Matt > My petsc set works, runs with my FEA code and compiles and runs the fortran > programs in the petsc tutorials directories just fine. And in particular, > compiles > and runs the C versions from Bueler. > > I've attached the make file and the source. I must be doing something > obviously wrong but I can not spot it. > > -sanjay > > -- > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 19 22:01:45 2023 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 19 Aug 2023 23:01:45 -0400 Subject: [petsc-users] MetSetValues not found In-Reply-To: References: Message-ID: You are attempting to pass the entire aA array into each MatSetValues() call when you should pass one row with each call. Use PetscCallA(MatSetValues(A,1,i,4,j,aA(i+1,:),INSERT_VALUES,ierr)) > On Aug 19, 2023, at 4:47 PM, Sanjay Govindjee wrote: > > I recently got a copy of E.Bueler's petsc book and was converting the > programs to fortran for my students and have run into a compile error that > I can not figure out. I comment out the MatSetValues line the code compiles fine. > > When I run make on vecmatkspf.F90, I get the error: > 42 | PetscCallA(MatSetValues(A,1,i,4,j,aA,INSERT_VALUES,ierr)) > | 1 > Error: There is no specific subroutine for the generic 'matsetvalues' at (1) > My petsc set works, runs with my FEA code and compiles and runs the fortran > programs in the petsc tutorials directories just fine. And in particular, compiles > and runs the C versions from Bueler. > > I've attached the make file and the source. I must be doing something obviously wrong but I can not spot it. > > -sanjay > -- > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Sun Aug 20 00:51:28 2023 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Sat, 19 Aug 2023 22:51:28 -0700 Subject: [petsc-users] MetSetValues not found In-Reply-To: References: Message-ID: <3273cb65-c3c3-b52e-305a-56427d94b313@berkeley.edu> Thanks.? I hadn't gotten to that bit yet.? I actually rewrote it a bit to store the transpose of aA, and then pass in the columns of aA-transpose. Thanks again for the help, -sanjay --- From jonas.lundgren at liu.se Mon Aug 21 06:06:18 2023 From: jonas.lundgren at liu.se (Jonas Lundgren) Date: Mon, 21 Aug 2023 11:06:18 +0000 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE Message-ID: Dear PETSc users, I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). First, some details: - I use a version of PETSc 3.19.1 - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE: https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF - I am first solving a state problem, then an adjoint problem using the same linear operator. - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) The main problem is that PCREDISTRIBUTE internally doesn't use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says "it doesn't make sense"). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won't solve the previously mentioned problem, which seems to be the hard nut to crack. So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? Any help is welcome! Best regards, Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 21 10:47:54 2023 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 21 Aug 2023 08:47:54 -0700 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: References: Message-ID: On Mon, Aug 21, 2023 at 4:06?AM Jonas Lundgren via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear PETSc users, > > > > I have a problem regarding the setting of initial guess to KSP when using > PCREDISTRIBUTE as the preconditioner. (The reason to why I use > PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and > PCREDISTRIBUTE successfully reduces the problem size and therefore speeds > up the solving). > > > > First, some details: > > - I use a version of PETSc 3.19.1 > > - The KSP I use is KSPPREONLY, as suggested in the manual pages > of PCREDISTRIBUTE: > https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ > > - I use KSPBCGSL as sub-KSP. I can perfectly well solve my > problem using this as my main KSP, but the performance is much worse than > when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the > amount of fixed DOF > > - I am first solving a state problem, then an adjoint problem > using the same linear operator. > > - The adjoint vector is used as sensitivity information to > update a design. After the design update, the state+adjoint problems are > solved again with a slightly updated linear operator. This is done for > hundreds of (design) iteration steps > > - I want the initial guess for the state problem to be the state > solution from the previous (design) iteration, and same for the adjoint > problem > > - I am aware of the default way of setting a custom initial > guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing > the actual guess in the x vector in the call to KSPSolve(ksp, b, x) > > > > The main problem is that PCREDISTRIBUTE internally doesn?t use the input > solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out > the solution vector (x) when starting to build x = diag(A)^{-1} b in the > beginning of PCApply_Redistribute(), and uses red->x as the solution > vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the > sub-KSP with an initial guess. > > > > Additionally, KSPPREONLY prohibits the use of having a nonzero initial > guess (the error message says ?it doesn?t make sense?). I guess I can > remove the line raising this error and recompile the PETSc libraries, but > it still won?t solve the previously mentioned problem, which seems to be > the hard nut to crack. > > > > So far, I have found out that if I create 2 KSP object, one each for the > state and adjoint problems, it is enough with calling > KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if > the variable red->x in PCApply_Redistribute() is kept untouched in memory > between calls to the main KSP and therefore is used as (non-zero) initial > guess to the sub-KSP. This has been verified by introducing > PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); > in PCApply_Redistribute(), recompiling the PETSc library, and then > inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", > (PetscObject *)&redx); in my own program source file. > > > > However, I would like to only create 1 KSP to be used with both the state > and adjoint problems (same linear operator), for memory reasons. > I would like to understand this better. Are you trying to save the temporary vector memory for another solver? For BCGSL, this is only a few vectors. It should be much less than the matrices. Thanks, Matt > When I do this, the initial guesses are mixed up between the two problems: > the initial guess for the adjoint problem is the solution to the state > problem in the current design iteration, and the initial guess for the > state problem is the solution to the adjoint problem in the previous design > iteration. These are very bad guesses that increases the time to > convergence in each state/adjoint solve. > > > > So, the core of the problem (as far as I can understand) is that I want to > control the initial guess red->x in PCApply_Redistribute(). > > > > The only solution I can think of is to include a call to > PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the > initial guess from my main program. And then I need to keep track of the > initial guesses for my both problems in my main program myself (minor > problem). This is maybe not the neatest way, and I do not know if this > approach affects the performance negatively? Maybe one call each to > PetscObjectQuery() and PetscObjectCompose() per call to > PCApply_Redistribute() is negligible? > > > > Is there another (and maybe simpler) solution to this problem? Maybe I can > SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is > zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? > > > > Any help is welcome! > > > > > > Best regards, > > Jonas Lundgren > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From meator.dev at gmail.com Mon Aug 21 12:14:25 2023 From: meator.dev at gmail.com (meator) Date: Mon, 21 Aug 2023 19:14:25 +0200 Subject: [petsc-users] Some questions about the directory structure Message-ID: <6c8b3d45-a6ef-4a4d-b377-c32a99ccae6d@gmail.com> Hi. I'm trying to package PETSc using the tarball with documentation (https://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-with-docs-3.19.4.tar.gz) and I've got some questions about the structure of PETSc. What are the contents of the /usr/lib/petsc directory in destdir for? This directory has two subrirectories: bin and conf. Why is the bin/ directory in lib/? lib/ should be for libraries. Are the executables contained in /usr/lib/petsc/bin essential to the user or the developer (should this be in a -devel subpackage)? Some of the scripts don't have the executable bit (/usr/lib/petsc/bin/configureTAS.py, /usr/lib/petsc/bin/extract.py, /usr/lib/petsc/bin/petsc_tas_style.mplstyle, /usr/lib/petsc/bin/tasClasses.py, /usr/lib/petsc/bin/xml2flamegraph.py). What is their purpose? The /usr/lib/petsc/conf directory seems to be related to the build process. Is that correct? If yes, I will delete the directory from the package because packages shouldn't include these things. This directory even includes uninstall.py which is undesirable for a packaged program because this is the package manager's job. /usr/share/petsc looks like it contains additional info useful to the developers, therefore it should be in a -devel subpackage. I see that the docs directory contains .buildinfo. Does this directory contain additional build artifacts (that should be removed)? The main index.html of the documentation (from the tarball linked at the beginning of this e-mail) is invalid. It has all the menus but the main part of the page is blank. The raw HTML is cut off, there's no content and there are unclosed tags. Many of my questions may be trivial but I want to make sure to not break the package. Thanks in advance -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_0x1A14CB3464CBE5BF.asc Type: application/pgp-keys Size: 3780 bytes Desc: OpenPGP public key URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 659 bytes Desc: OpenPGP digital signature URL: From jonas.lundgren at liu.se Mon Aug 21 12:25:52 2023 From: jonas.lundgren at liu.se (Jonas Lundgren) Date: Mon, 21 Aug 2023 17:25:52 +0000 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: References: Message-ID: Dear Matt, thank you for your answer. I am using PCFIELDSPLIT as PC in BCGSL, and my setup is using quite much memory (it seems). I have found a setup which is fast, but uses so much memory that I would like to avoid doubling it by using two KSP objects. If I remember correctly, using one instead of two KSPs in this case saves about 20-30% of memory (in my specific problem, which includes other undisclosed parts as well). That is the underlying "memory reason" that eventually has led me to the conclusion that I need to access the vector red->x in PCApply_Redistribute(). Best regards, Jonas Lundgren ________________________________ Fr?n: Matthew Knepley Skickat: den 21 augusti 2023 17:47 Till: Jonas Lundgren Kopia: petsc-users at mcs.anl.gov ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE On Mon, Aug 21, 2023 at 4:06?AM Jonas Lundgren via petsc-users > wrote: Dear PETSc users, I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). First, some details: - I use a version of PETSc 3.19.1 - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE: https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF - I am first solving a state problem, then an adjoint problem using the same linear operator. - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) The main problem is that PCREDISTRIBUTE internally doesn?t use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says ?it doesn?t make sense?). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won?t solve the previously mentioned problem, which seems to be the hard nut to crack. So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. I would like to understand this better. Are you trying to save the temporary vector memory for another solver? For BCGSL, this is only a few vectors. It should be much less than the matrices. Thanks, Matt When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? Any help is welcome! Best regards, Jonas Lundgren -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 21 12:54:07 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 21 Aug 2023 12:54:07 -0500 (CDT) Subject: [petsc-users] Some questions about the directory structure In-Reply-To: <6c8b3d45-a6ef-4a4d-b377-c32a99ccae6d@gmail.com> References: <6c8b3d45-a6ef-4a4d-b377-c32a99ccae6d@gmail.com> Message-ID: On Mon, 21 Aug 2023, meator wrote: > Hi. I'm trying to package PETSc using the tarball with documentation > (https://ftp.mcs.anl.gov/pub/petsc/release-snapshots/petsc-with-docs-3.19.4.tar.gz) > and I've got some questions about the structure of PETSc. > > What are the contents of the /usr/lib/petsc directory in destdir for? This > directory has two subrirectories: bin and conf. Why is the bin/ directory in > lib/? lib/ should be for libraries. balay at p1 /home/balay $ find /usr/lib -name bin /usr/lib/debug/bin /usr/lib/debug/usr/bin /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.372.b07-6.fc38.x86_64/jre/bin /usr/lib/jvm/java-17-openjdk-17.0.8.0.7-1.fc38.x86_64/bin /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-1.fc38.x86_64/bin balay at p1 /home/balay $ find /usr/lib64 -name bin /usr/lib64/qt5/bin /usr/lib64/R/bin > Are the executables contained in > /usr/lib/petsc/bin essential to the user or the developer (should this be in a > -devel subpackage)? Hm - different scripts/utils have different propose. I guess most are useful from devel sub package. > Some of the scripts don't have the executable bit > (/usr/lib/petsc/bin/configureTAS.py, /usr/lib/petsc/bin/extract.py, > /usr/lib/petsc/bin/petsc_tas_style.mplstyle, /usr/lib/petsc/bin/tasClasses.py, > /usr/lib/petsc/bin/xml2flamegraph.py). What is their purpose? I guess some of them can use some cleanup. extract.py likely belongs to bin/maint [and excluded from tarball..] > > The /usr/lib/petsc/conf directory seems to be related to the build process. Is > that correct? These have makefiles that can be included from user/application makefiles - to get compile/link working seamlessly. > If yes, I will delete the directory from the package because > packages shouldn't include these things. This directory even includes > uninstall.py which is undesirable for a packaged program because this is the > package manager's job. Sure - some files might not make sense to be included in a packaging system. > > /usr/share/petsc looks like it contains additional info useful to the > developers, therefore it should be in a -devel subpackage. > > I see that the docs directory contains .buildinfo. Does this directory contain > additional build artifacts (that should be removed)? I guess some of these files should be excluded from tarball. > > The main index.html of the documentation (from the tarball linked at the > beginning of this e-mail) is invalid. It has all the menus but the main part > of the page is blank. The raw HTML is cut off, there's no content and there > are unclosed tags. Hm - since the docs are primarily tested/used at petsc.org - some of that functionality probably doesn't work as raw html - and might need fixes. Satish > > Many of my questions may be trivial but I want to make sure to not break the > package. > > Thanks in advance > > From meator.dev at gmail.com Mon Aug 21 13:09:10 2023 From: meator.dev at gmail.com (meator) Date: Mon, 21 Aug 2023 20:09:10 +0200 Subject: [petsc-users] Some questions about the directory structure In-Reply-To: References: <6c8b3d45-a6ef-4a4d-b377-c32a99ccae6d@gmail.com> Message-ID: <2896c640-0d0a-434c-a66c-cec5402b8935@gmail.com> Thanks for your reply! It has clarified many things I was unsure about. On 8/21/23 19:54, Satish Balay wrote: >> What are the contents of the /usr/lib/petsc directory in destdir for? This >> directory has two subrirectories: bin and conf. Why is the bin/ directory in >> lib/? lib/ should be for libraries. > > balay at p1 /home/balay > $ find /usr/lib -name bin > /usr/lib/debug/bin > /usr/lib/debug/usr/bin > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.372.b07-6.fc38.x86_64/jre/bin > /usr/lib/jvm/java-17-openjdk-17.0.8.0.7-1.fc38.x86_64/bin > /usr/lib/jvm/java-11-openjdk-11.0.20.0.8-1.fc38.x86_64/bin > balay at p1 /home/balay > $ find /usr/lib64 -name bin > /usr/lib64/qt5/bin > /usr/lib64/R/bin I see. >> The /usr/lib/petsc/conf directory seems to be related to the build process. Is >> that correct? > > These have makefiles that can be included from user/application makefiles - to get compile/link working seamlessly. Thanks for the feedback. I would have removed this directory otherwise. >> >> /usr/share/petsc looks like it contains additional info useful to the >> developers, therefore it should be in a -devel subpackage. > >> >> I see that the docs directory contains .buildinfo. Does this directory contain >> additional build artifacts (that should be removed)? > > I guess some of these files should be excluded from tarball. It might be just this file alone. But I'm not familiar with the documentation generator software you're using so I have asked just to be sure. -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_0x1A14CB3464CBE5BF.asc Type: application/pgp-keys Size: 3780 bytes Desc: OpenPGP public key URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 659 bytes Desc: OpenPGP digital signature URL: From bsmith at petsc.dev Mon Aug 21 13:12:58 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Aug 2023 14:12:58 -0400 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: References: Message-ID: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> When you use 2 KSP so that you can use the previous "solution" as the initial guess for the next problem, how much savings do you get? In iterations inside PCREDISTRIBUTE and in time (relative to the entire linear solver time and relative to the entire application run)? You can get this information running with -log_view That is, run the 2 KSP simulation twice, once with the inner KSPSetNonzeroInitialGuess() on and once with it off and compare the times for the two cases. Thanks Barry Using KSPSetNonzeroInitialGuess() requires an extra matrix-vector product and preconditioner application, so I would like to verify that you have a measurable performance improvement with the initial guess. > On Aug 21, 2023, at 7:06 AM, Jonas Lundgren via petsc-users wrote: > > Dear PETSc users, > > I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). > > First, some details: > - I use a version of PETSc 3.19.1 > - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE:https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ > - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF > - I am first solving a state problem, then an adjoint problem using the same linear operator. > - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps > - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem > - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) > > The main problem is that PCREDISTRIBUTE internally doesn?t use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. > > Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says ?it doesn?t make sense?). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won?t solve the previously mentioned problem, which seems to be the hard nut to crack. > > So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. > > However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. > > So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). > > The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? > > Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? > > Any help is welcome! > > > Best regards, > Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Aug 21 14:00:07 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 21 Aug 2023 19:00:07 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | +---------------------------------------------------------------------------------------+ You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=gpu #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 #SBATCH --nodes=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:4 export OMP_NUM_THREADS=1 # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Thank you for the advice, Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.lundgren at liu.se Mon Aug 21 14:28:21 2023 From: jonas.lundgren at liu.se (Jonas Lundgren) Date: Mon, 21 Aug 2023 19:28:21 +0000 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> References: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> Message-ID: Dear Barry, I have tried what you suggested on a (not too large) test example on a bigger cluster that I have access to, using -redistribute_ksp_initial_guess_nonzero 0 and 1, respectively. The average timings during the 5 first (design) iterations are 8.7 s (state) and 7.9 s (adjoint) for the case with zero initial guesses and 5.0 s (state) and 5.7 s (adjoint) for the cases with nonzero initial guesses. These solvings are the bottleneck of my program, accounting for about 60-90% of the total computational time, depending on various parameters. The program is basically consisting of the loop: solve state > solve adjoint > update design > repeat. This is repeated for a couple of hundred iterations. >From my experience, the number of iterations to convergence in each state/adjoint solve will decrease when increasing the (design) iterative counter (i.e. the longer the process has gone on for) IF the initial guess is the solution to the previous solve. This is because the design update is smaller in the end of the process than in the beginning, and a smaller design update leads to smaller changes in state/adjoint solution between subsequent (design) iterations. This means that the numbers provided above are on the low side: most likely the savings can be even more in the end of the design process. Best regards, Jonas Lundgren Fr?n: Barry Smith Skickat: den 21 augusti 2023 20:13 Till: Jonas Lundgren Kopia: petsc-users at mcs.anl.gov ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE When you use 2 KSP so that you can use the previous "solution" as the initial guess for the next problem, how much savings do you get? In iterations inside PCREDISTRIBUTE and in time (relative to the entire linear solver time and relative to the entire application run)? You can get this information running with -log_view That is, run the 2 KSP simulation twice, once with the inner KSPSetNonzeroInitialGuess() on and once with it off and compare the times for the two cases. Thanks Barry Using KSPSetNonzeroInitialGuess() requires an extra matrix-vector product and preconditioner application, so I would like to verify that you have a measurable performance improvement with the initial guess. On Aug 21, 2023, at 7:06 AM, Jonas Lundgren via petsc-users > wrote: Dear PETSc users, I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). First, some details: - I use a version of PETSc 3.19.1 - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE:https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF - I am first solving a state problem, then an adjoint problem using the same linear operator. - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) The main problem is that PCREDISTRIBUTE internally doesn't use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says "it doesn't make sense"). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won't solve the previously mentioned problem, which seems to be the hard nut to crack. So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? Any help is welcome! Best regards, Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Aug 21 14:29:21 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 21 Aug 2023 14:29:21 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi, Macros, If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. Thanks. On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Hi Junchao, something I'm noting related to running with cuda enabled > linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu > calculations, the GPU 0 in the node is taking what seems to be all > sub-matrices corresponding to all the MPI processes in the node. This is > the result of the nvidia-smi command on a node with 8 MPI processes (each > advancing the same number of unknowns in the calculation) and 4 GPU V100s: > > Mon Aug 21 14:36:07 2023 > > +---------------------------------------------------------------------------------------+ > | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA > Version: 12.2 | > > |-----------------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M | Bus-Id Disp.A | > Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | > GPU-Util Compute M. | > | | | > MIG M. | > > |=========================================+======================+======================| > | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | > 0 | > | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | > 0 | > | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | > 0 | > | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | > 0 | > | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > > > > +---------------------------------------------------------------------------------------+ > | Processes: > | > | GPU GI CI PID Type Process name > GPU Memory | > | ID ID > Usage | > > |=======================================================================================| > | 0 N/A N/A 214626 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214630 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 1 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 1 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > > +---------------------------------------------------------------------------------------+ > > > You can see that GPU 0 is connected to all 8 MPI Processes, each taking > about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. > I'm wondering if this is expected or there are some changes I need to do on > my submission script/runtime parameters. > This is the script in this case (2 nodes, 8 MPI processes/node, 4 > GPU/node): > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=gpu > #SBATCH --ntasks=16 > #SBATCH --ntasks-per-node=8 > #SBATCH --cpus-per-task=1 > #SBATCH --nodes=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:4 > > export OMP_NUM_THREADS=1 > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > > srun -N 2 -n 16 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux > test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda > > Thank you for the advice, > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Mon Aug 21 14:38:00 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Mon, 21 Aug 2023 19:38:00 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? It says in the script it has allocated 2.4GB Best, Marcos ________________________________ From: Junchao Zhang Sent: Monday, August 21, 2023 3:29 PM To: Vanella, Marcos (Fed) Cc: PETSc users list ; Guan, Collin X. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. Thanks. On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | +---------------------------------------------------------------------------------------+ You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=gpu #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 #SBATCH --nodes=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:4 export OMP_NUM_THREADS=1 # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Thank you for the advice, Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 21 15:04:21 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Aug 2023 16:04:21 -0400 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: References: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> Message-ID: <6C800943-173D-46A9-890A-A43F0AF1D317@petsc.dev> Ok, thanks. Definitely more than I expected. It is easy to add the support you requested. I'll push a branch latter today. Barry > On Aug 21, 2023, at 3:28 PM, Jonas Lundgren wrote: > > Dear Barry, > > I have tried what you suggested on a (not too large) test example on a bigger cluster that I have access to, using -redistribute_ksp_initial_guess_nonzero 0 and 1, respectively. The average timings during the 5 first (design) iterations are 8.7 s (state) and 7.9 s (adjoint) for the case with zero initial guesses and 5.0 s (state) and 5.7 s (adjoint) for the cases with nonzero initial guesses. These solvings are the bottleneck of my program, accounting for about 60-90% of the total computational time, depending on various parameters. The program is basically consisting of the loop: solve state > solve adjoint > update design > repeat. This is repeated for a couple of hundred iterations. > > From my experience, the number of iterations to convergence in each state/adjoint solve will decrease when increasing the (design) iterative counter (i.e. the longer the process has gone on for) IF the initial guess is the solution to the previous solve. This is because the design update is smaller in the end of the process than in the beginning, and a smaller design update leads to smaller changes in state/adjoint solution between subsequent (design) iterations. This means that the numbers provided above are on the low side: most likely the savings can be even more in the end of the design process. > > Best regards, > Jonas Lundgren > > Fr?n: Barry Smith > > Skickat: den 21 augusti 2023 20:13 > Till: Jonas Lundgren > > Kopia: petsc-users at mcs.anl.gov > ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE > > > When you use 2 KSP so that you can use the previous "solution" as the initial guess for the next problem, how much savings do you get? In iterations inside PCREDISTRIBUTE and in time (relative to the entire linear solver time and relative to the entire application run)? You can get this information running with -log_view > > That is, run the 2 KSP simulation twice, once with the inner KSPSetNonzeroInitialGuess() on and once with it off and compare the times for the two cases. > > Thanks > > Barry > > Using KSPSetNonzeroInitialGuess() requires an extra matrix-vector product and preconditioner application, so I would like to verify that you have a measurable performance improvement with the initial guess. > > > > > > > > > On Aug 21, 2023, at 7:06 AM, Jonas Lundgren via petsc-users > wrote: > > Dear PETSc users, > > I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). > > First, some details: > - I use a version of PETSc 3.19.1 > - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE:https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ > - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF > - I am first solving a state problem, then an adjoint problem using the same linear operator. > - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps > - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem > - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) > > The main problem is that PCREDISTRIBUTE internally doesn?t use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. > > Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says ?it doesn?t make sense?). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won?t solve the previously mentioned problem, which seems to be the hard nut to crack. > > So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. > > However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. > > So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). > > The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? > > Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? > > Any help is welcome! > > > Best regards, > Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Aug 21 15:17:25 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 21 Aug 2023 15:17:25 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: That is a good question. Looking at https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated. --Junchao Zhang On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) < marcos.vanella at nist.gov> wrote: > Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI > processes meshes but only working on 2 of them? > It says in the script it has allocated 2.4GB > Best, > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 21, 2023 3:29 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Guan, Collin X. (Fed) < > collin.guan at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > If you look at the PIDs of the nvidia-smi output, you will only find 8 > unique PIDs, which is expected since you allocated 8 MPI ranks per node. > The duplicate PIDs are usually for threads spawned by the MPI runtime > (for example, progress threads in MPI implementation). So your job script > and output are all good. > > Thanks. > > On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, something I'm noting related to running with cuda enabled > linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu > calculations, the GPU 0 in the node is taking what seems to be all > sub-matrices corresponding to all the MPI processes in the node. This is > the result of the nvidia-smi command on a node with 8 MPI processes (each > advancing the same number of unknowns in the calculation) and 4 GPU V100s: > > Mon Aug 21 14:36:07 2023 > > +---------------------------------------------------------------------------------------+ > | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA > Version: 12.2 | > > |-----------------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M | Bus-Id Disp.A | > Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | > GPU-Util Compute M. | > | | | > MIG M. | > > |=========================================+======================+======================| > | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | > 0 | > | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | > 0 | > | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | > 0 | > | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | > 0 | > | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > > > > +---------------------------------------------------------------------------------------+ > | Processes: > | > | GPU GI CI PID Type Process name > GPU Memory | > | ID ID > Usage | > > |=======================================================================================| > | 0 N/A N/A 214626 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214630 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 1 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 1 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > > +---------------------------------------------------------------------------------------+ > > > You can see that GPU 0 is connected to all 8 MPI Processes, each taking > about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. > I'm wondering if this is expected or there are some changes I need to do on > my submission script/runtime parameters. > This is the script in this case (2 nodes, 8 MPI processes/node, 4 > GPU/node): > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=gpu > #SBATCH --ntasks=16 > #SBATCH --ntasks-per-node=8 > #SBATCH --cpus-per-task=1 > #SBATCH --nodes=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:4 > > export OMP_NUM_THREADS=1 > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > > srun -N 2 -n 16 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux > test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda > > Thank you for the advice, > Marcos > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.lundgren at liu.se Mon Aug 21 15:18:13 2023 From: jonas.lundgren at liu.se (Jonas Lundgren) Date: Mon, 21 Aug 2023 20:18:13 +0000 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: <6C800943-173D-46A9-890A-A43F0AF1D317@petsc.dev> References: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> <6C800943-173D-46A9-890A-A43F0AF1D317@petsc.dev> Message-ID: Thanks, Barry! What solution do you have in mind? I tried a bit myself using SCATTER_FORWARD of the input vector x in PCApply_Redistribute, together with allowing for nonzero initial guess in KSPPREONLY, but that might not be the best solution in a public branch? I guess the big gain is due to the fact that the subsequent solvings of state/adjoint problems are done with similar (but not exactly the same) linear operator, so that they become almost the same problem. On the other hand, the state and adjoint problems are not similar to each other, making the solution to one a very bad initial guess to the other. Again, thank you for your support. Best regards, Jonas Lundgren Fr?n: Barry Smith Skickat: den 21 augusti 2023 22:04 Till: Jonas Lundgren Kopia: petsc-users at mcs.anl.gov ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE Ok, thanks. Definitely more than I expected. It is easy to add the support you requested. I'll push a branch latter today. Barry On Aug 21, 2023, at 3:28 PM, Jonas Lundgren > wrote: Dear Barry, I have tried what you suggested on a (not too large) test example on a bigger cluster that I have access to, using -redistribute_ksp_initial_guess_nonzero 0 and 1, respectively. The average timings during the 5 first (design) iterations are 8.7 s (state) and 7.9 s (adjoint) for the case with zero initial guesses and 5.0 s (state) and 5.7 s (adjoint) for the cases with nonzero initial guesses. These solvings are the bottleneck of my program, accounting for about 60-90% of the total computational time, depending on various parameters. The program is basically consisting of the loop: solve state > solve adjoint > update design > repeat. This is repeated for a couple of hundred iterations. >From my experience, the number of iterations to convergence in each state/adjoint solve will decrease when increasing the (design) iterative counter (i.e. the longer the process has gone on for) IF the initial guess is the solution to the previous solve. This is because the design update is smaller in the end of the process than in the beginning, and a smaller design update leads to smaller changes in state/adjoint solution between subsequent (design) iterations. This means that the numbers provided above are on the low side: most likely the savings can be even more in the end of the design process. Best regards, Jonas Lundgren Fr?n: Barry Smith > Skickat: den 21 augusti 2023 20:13 Till: Jonas Lundgren > Kopia: petsc-users at mcs.anl.gov ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE When you use 2 KSP so that you can use the previous "solution" as the initial guess for the next problem, how much savings do you get? In iterations inside PCREDISTRIBUTE and in time (relative to the entire linear solver time and relative to the entire application run)? You can get this information running with -log_view That is, run the 2 KSP simulation twice, once with the inner KSPSetNonzeroInitialGuess() on and once with it off and compare the times for the two cases. Thanks Barry Using KSPSetNonzeroInitialGuess() requires an extra matrix-vector product and preconditioner application, so I would like to verify that you have a measurable performance improvement with the initial guess. On Aug 21, 2023, at 7:06 AM, Jonas Lundgren via petsc-users > wrote: Dear PETSc users, I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). First, some details: - I use a version of PETSc 3.19.1 - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE:https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF - I am first solving a state problem, then an adjoint problem using the same linear operator. - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) The main problem is that PCREDISTRIBUTE internally doesn't use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says "it doesn't make sense"). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won't solve the previously mentioned problem, which seems to be the hard nut to crack. So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? Any help is welcome! Best regards, Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 21 15:51:21 2023 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 21 Aug 2023 16:51:21 -0400 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: References: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> <6C800943-173D-46A9-890A-A43F0AF1D317@petsc.dev> Message-ID: <46A7B0DE-1F77-43A6-9025-4AD34B7CD17B@petsc.dev> > On Aug 21, 2023, at 4:18 PM, Jonas Lundgren wrote: > > Thanks, Barry! > > What solution do you have in mind? I tried a bit myself using SCATTER_FORWARD of the input vector x in PCApply_Redistribute, together with allowing for nonzero initial guess in KSPPREONLY, but that might not be the best solution in a public branch? Yes, that is basically what I would do. 1) Change PetscCheck(ksp->guess_zero, PetscObjectComm((PetscObject)ksp), PETSC_ERR_USER, "Running KSP of preonly doesn't make sense with nonzero initial guess\n\ you probably want a KSP type of Richardson"); to be allowed it if the PC is redistribute 2) if the inner ksp has nonzero initial guess flag then scatter forward the x When you run, you would need both -ksp_initial_guess_nonzero -redistribute_initial_guess_nonzero Does this sound right? If you already have the code, or mostly, you could make a MR and we could finish it and merge it into main, or I can just do it. Barry > > I guess the big gain is due to the fact that the subsequent solvings of state/adjoint problems are done with similar (but not exactly the same) linear operator, so that they become almost the same problem. On the other hand, the state and adjoint problems are not similar to each other, making the solution to one a very bad initial guess to the other. > > Again, thank you for your support. > > Best regards, > Jonas Lundgren > > Fr?n: Barry Smith > > Skickat: den 21 augusti 2023 22:04 > Till: Jonas Lundgren > > Kopia: petsc-users at mcs.anl.gov > ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE > > > Ok, thanks. Definitely more than I expected. > > It is easy to add the support you requested. I'll push a branch latter today. > > Barry > > > > On Aug 21, 2023, at 3:28 PM, Jonas Lundgren > wrote: > > Dear Barry, > > I have tried what you suggested on a (not too large) test example on a bigger cluster that I have access to, using -redistribute_ksp_initial_guess_nonzero 0 and 1, respectively. The average timings during the 5 first (design) iterations are 8.7 s (state) and 7.9 s (adjoint) for the case with zero initial guesses and 5.0 s (state) and 5.7 s (adjoint) for the cases with nonzero initial guesses. These solvings are the bottleneck of my program, accounting for about 60-90% of the total computational time, depending on various parameters. The program is basically consisting of the loop: solve state > solve adjoint > update design > repeat. This is repeated for a couple of hundred iterations. > > From my experience, the number of iterations to convergence in each state/adjoint solve will decrease when increasing the (design) iterative counter (i.e. the longer the process has gone on for) IF the initial guess is the solution to the previous solve. This is because the design update is smaller in the end of the process than in the beginning, and a smaller design update leads to smaller changes in state/adjoint solution between subsequent (design) iterations. This means that the numbers provided above are on the low side: most likely the savings can be even more in the end of the design process. > > Best regards, > Jonas Lundgren > > Fr?n: Barry Smith > > Skickat: den 21 augusti 2023 20:13 > Till: Jonas Lundgren > > Kopia: petsc-users at mcs.anl.gov > ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE > > > When you use 2 KSP so that you can use the previous "solution" as the initial guess for the next problem, how much savings do you get? In iterations inside PCREDISTRIBUTE and in time (relative to the entire linear solver time and relative to the entire application run)? You can get this information running with -log_view > > That is, run the 2 KSP simulation twice, once with the inner KSPSetNonzeroInitialGuess() on and once with it off and compare the times for the two cases. > > Thanks > > Barry > > Using KSPSetNonzeroInitialGuess() requires an extra matrix-vector product and preconditioner application, so I would like to verify that you have a measurable performance improvement with the initial guess. > > > > > > > > > > On Aug 21, 2023, at 7:06 AM, Jonas Lundgren via petsc-users > wrote: > > Dear PETSc users, > > I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). > > First, some details: > - I use a version of PETSc 3.19.1 > - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE:https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ > - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF > - I am first solving a state problem, then an adjoint problem using the same linear operator. > - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps > - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem > - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) > > The main problem is that PCREDISTRIBUTE internally doesn?t use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. > > Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says ?it doesn?t make sense?). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won?t solve the previously mentioned problem, which seems to be the hard nut to crack. > > So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. > > However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. > > So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). > > The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? > > Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? > > Any help is welcome! > > > Best regards, > Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Tue Aug 22 14:53:44 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Tue, 22 Aug 2023 19:53:44 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? Thanks! I configured the library --with-cuda and when compiling I get a compilation error with CUDAC: CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :141:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :198:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :149:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :208:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(len); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(flg); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(haystack); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(needle); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 **************************ERROR************************************* Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** ________________________________ From: Junchao Zhang Sent: Monday, August 21, 2023 4:17 PM To: Vanella, Marcos (Fed) Cc: PETSc users list ; Guan, Collin X. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU That is a good question. Looking at https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated. --Junchao Zhang On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) > wrote: Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? It says in the script it has allocated 2.4GB Best, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 21, 2023 3:29 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. Thanks. On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | +---------------------------------------------------------------------------------------+ You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=gpu #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 #SBATCH --nodes=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:4 export OMP_NUM_THREADS=1 # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Thank you for the advice, Marcos -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 22 15:03:21 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 22 Aug 2023 15:03:21 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: On Tue, Aug 22, 2023 at 2:54?PM Vanella, Marcos (Fed) via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Junchao, both the slurm scontrol show job_id -dd and looking at > CUDA_VISIBLE_DEVICES does not provide information about which MPI process > is associated to which GPU in the node in our system. I can see this with > nvidia-smi, but if you have any other suggestion using slurm I would like > to hear it. > > I've been trying to compile the code+Petsc in summit, but have been having > all sorts of issues related to spectrum-mpi, and the different compilers > they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle > Fortran 2018, others give issues of repeated MPI definitions, etc.). > The PETSc configure examples are in the repository: https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads Thanks, Matt > I also wanted to ask you, do you know if it is possible to compile PETSc > with the xl/16.1.1-10 suite? > > Thanks! > > I configured the library --with-cuda and when compiling I get a > compilation error with CUDAC: > > CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:1: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: > In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: > warning: Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. > [-W#pragma-messages] > THRUST_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: > note: expanded from macro 'THRUST_COMPILER_DEPRECATION' > THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL' > # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning > #msg) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' > # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :141:6: note: expanded from here > GCC warning "Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:2: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: > In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: > In file included from > /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: > In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: > CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT > to suppress this message. [-W#pragma-messages] > CUB_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: > expanded from macro 'CUB_COMPILER_DEPRECATION' > CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: > expanded from macro 'CUB_COMP_DEPR_IMPL' > # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: > expanded from macro 'CUB_COMP_DEPR_IMPL0' > # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :198:6: note: expanded from here > GCC warning "CUB requires at least Clang 7.0. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): > warning #1835-D: attribute "warn_unused_result" does not apply here > > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:1: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: > In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: > warning: Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. > [-W#pragma-messages] > THRUST_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: > note: expanded from macro 'THRUST_COMPILER_DEPRECATION' > THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL' > # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning > #msg) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: > note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' > # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :149:6: note: expanded from here > GCC warning "Thrust requires at least Clang 7.0. Define > THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > In file included from > /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ > curand2.cu:2: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: > In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: > In file included from > /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: > In file included from > /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: > In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: > CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT > to suppress this message. [-W#pragma-messages] > CUB_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: > expanded from macro 'CUB_COMPILER_DEPRECATION' > CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: > expanded from macro 'CUB_COMP_DEPR_IMPL' > # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: > expanded from macro 'CUB_COMP_DEPR_IMPL0' > # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :208:6: note: expanded from here > GCC warning "CUB requires at least Clang 7.0. Define > CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): > warning #1835-D: attribute "warn_unused_result" does not apply here > > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(len); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(s); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(flg); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(n); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(s); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(n); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(b); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(b); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(tmp); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(haystack); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(needle); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(tmp); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: > error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > fatal error: too many errors emitted, stopping now [-ferror-limit=] > 20 errors generated. > Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. > gmake[3]: *** [gmakefile:209: > arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 > gmake[2]: *** > [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: > libs] Error 2 > **************************ERROR************************************* > Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log > Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 21, 2023 4:17 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Guan, Collin X. (Fed) < > collin.guan at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > That is a good question. Looking at > https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if > you can share the output of your job so we can search CUDA_VISIBLE_DEVICES > and see how GPUs were allocated. > > --Junchao Zhang > > > On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI > processes meshes but only working on 2 of them? > It says in the script it has allocated 2.4GB > Best, > Marcos > ------------------------------ > *From:* Junchao Zhang > *Sent:* Monday, August 21, 2023 3:29 PM > *To:* Vanella, Marcos (Fed) > *Cc:* PETSc users list ; Guan, Collin X. (Fed) < > collin.guan at nist.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > If you look at the PIDs of the nvidia-smi output, you will only find 8 > unique PIDs, which is expected since you allocated 8 MPI ranks per node. > The duplicate PIDs are usually for threads spawned by the MPI runtime > (for example, progress threads in MPI implementation). So your job script > and output are all good. > > Thanks. > > On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) < > marcos.vanella at nist.gov> wrote: > > Hi Junchao, something I'm noting related to running with cuda enabled > linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu > calculations, the GPU 0 in the node is taking what seems to be all > sub-matrices corresponding to all the MPI processes in the node. This is > the result of the nvidia-smi command on a node with 8 MPI processes (each > advancing the same number of unknowns in the calculation) and 4 GPU V100s: > > Mon Aug 21 14:36:07 2023 > > +---------------------------------------------------------------------------------------+ > | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA > Version: 12.2 | > > |-----------------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M | Bus-Id Disp.A | > Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | > GPU-Util Compute M. | > | | | > MIG M. | > > |=========================================+======================+======================| > | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | > 0 | > | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | > 0 | > | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | > 0 | > | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | > 0 | > | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% > Default | > | | | > N/A | > > +-----------------------------------------+----------------------+----------------------+ > > > > +---------------------------------------------------------------------------------------+ > | Processes: > | > | GPU GI CI PID Type Process name > GPU Memory | > | ID ID > Usage | > > |=======================================================================================| > | 0 N/A N/A 214626 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214630 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 1 N/A N/A 214627 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 1 N/A N/A 214631 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214628 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214632 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214629 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214633 C > ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > > +---------------------------------------------------------------------------------------+ > > > You can see that GPU 0 is connected to all 8 MPI Processes, each taking > about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. > I'm wondering if this is expected or there are some changes I need to do on > my submission script/runtime parameters. > This is the script in this case (2 nodes, 8 MPI processes/node, 4 > GPU/node): > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=gpu > #SBATCH --ntasks=16 > #SBATCH --ntasks-per-node=8 > #SBATCH --cpus-per-task=1 > #SBATCH --nodes=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:4 > > export OMP_NUM_THREADS=1 > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > > srun -N 2 -n 16 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux > test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda > > Thank you for the advice, > Marcos > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Aug 22 16:25:19 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 22 Aug 2023 16:25:19 -0500 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Macros, yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use. Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node). The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. So you can try the helper script set_affinity_gpu_polaris.sh to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Then, check again with nvidia-smi to see if GPU memory is evenly allocated. --Junchao Zhang On Tue, Aug 22, 2023 at 3:03?PM Matthew Knepley wrote: > On Tue, Aug 22, 2023 at 2:54?PM Vanella, Marcos (Fed) via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi Junchao, both the slurm scontrol show job_id -dd and looking at >> CUDA_VISIBLE_DEVICES does not provide information about which MPI >> process is associated to which GPU in the node in our system. I can see >> this with nvidia-smi, but if you have any other suggestion using slurm I >> would like to hear it. >> >> I've been trying to compile the code+Petsc in summit, but have been >> having all sorts of issues related to spectrum-mpi, and the different >> compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't >> handle Fortran 2018, others give issues of repeated MPI definitions, etc.). >> > > The PETSc configure examples are in the repository: > > > https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads > > Thanks, > > Matt > > >> I also wanted to ask you, do you know if it is possible to compile PETSc >> with the xl/16.1.1-10 suite? >> >> Thanks! >> >> I configured the library --with-cuda and when compiling I get a >> compilation error with CUDAC: >> >> CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ >> curand2.cu:1: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: >> warning: Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. >> [-W#pragma-messages] >> THRUST_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: >> note: expanded from macro 'THRUST_COMPILER_DEPRECATION' >> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL' >> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning >> #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' >> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :141:6: note: expanded from here >> GCC warning "Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ >> curand2.cu:2: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: >> In file included from >> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: >> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: >> CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT >> to suppress this message. [-W#pragma-messages] >> CUB_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: >> expanded from macro 'CUB_COMPILER_DEPRECATION' >> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL' >> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL0' >> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :198:6: note: expanded from here >> GCC warning "CUB requires at least Clang 7.0. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): >> warning #1835-D: attribute "warn_unused_result" does not apply here >> >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ >> curand2.cu:1: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: >> warning: Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. >> [-W#pragma-messages] >> THRUST_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: >> note: expanded from macro 'THRUST_COMPILER_DEPRECATION' >> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL' >> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning >> #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: >> note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' >> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :149:6: note: expanded from here >> GCC warning "Thrust requires at least Clang 7.0. Define >> THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> In file included from >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/ >> curand2.cu:2: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: >> In file included from >> /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: >> In file included from >> /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: >> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: >> CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT >> to suppress this message. [-W#pragma-messages] >> CUB_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: >> expanded from macro 'CUB_COMPILER_DEPRECATION' >> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL' >> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: >> expanded from macro 'CUB_COMP_DEPR_IMPL0' >> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :208:6: note: expanded from here >> GCC warning "CUB requires at least Clang 7.0. Define >> CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): >> warning #1835-D: attribute "warn_unused_result" does not apply here >> >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(len); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(s); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(flg); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(n); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(s); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(n); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(b); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(b); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(tmp); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(haystack); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(needle); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(tmp); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: >> error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> fatal error: too many errors emitted, stopping now [-ferror-limit=] >> 20 errors generated. >> Error while processing >> /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. >> gmake[3]: *** [gmakefile:209: >> arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 >> gmake[2]: *** >> [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: >> libs] Error 2 >> **************************ERROR************************************* >> Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log >> Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to >> petsc-maint at mcs.anl.gov >> ******************************************************************** >> >> >> >> ------------------------------ >> *From:* Junchao Zhang >> *Sent:* Monday, August 21, 2023 4:17 PM >> *To:* Vanella, Marcos (Fed) >> *Cc:* PETSc users list ; Guan, Collin X. (Fed) < >> collin.guan at nist.gov> >> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi >> processes and 1 GPU >> >> That is a good question. Looking at >> https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if >> you can share the output of your job so we can search CUDA_VISIBLE_DEVICES >> and see how GPUs were allocated. >> >> --Junchao Zhang >> >> >> On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) < >> marcos.vanella at nist.gov> wrote: >> >> Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI >> processes meshes but only working on 2 of them? >> It says in the script it has allocated 2.4GB >> Best, >> Marcos >> ------------------------------ >> *From:* Junchao Zhang >> *Sent:* Monday, August 21, 2023 3:29 PM >> *To:* Vanella, Marcos (Fed) >> *Cc:* PETSc users list ; Guan, Collin X. (Fed) < >> collin.guan at nist.gov> >> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi >> processes and 1 GPU >> >> Hi, Macros, >> If you look at the PIDs of the nvidia-smi output, you will only find 8 >> unique PIDs, which is expected since you allocated 8 MPI ranks per node. >> The duplicate PIDs are usually for threads spawned by the MPI runtime >> (for example, progress threads in MPI implementation). So your job script >> and output are all good. >> >> Thanks. >> >> On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) < >> marcos.vanella at nist.gov> wrote: >> >> Hi Junchao, something I'm noting related to running with cuda enabled >> linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu >> calculations, the GPU 0 in the node is taking what seems to be all >> sub-matrices corresponding to all the MPI processes in the node. This is >> the result of the nvidia-smi command on a node with 8 MPI processes (each >> advancing the same number of unknowns in the calculation) and 4 GPU V100s: >> >> Mon Aug 21 14:36:07 2023 >> >> +---------------------------------------------------------------------------------------+ >> | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA >> Version: 12.2 | >> >> |-----------------------------------------+----------------------+----------------------+ >> | GPU Name Persistence-M | Bus-Id Disp.A | >> Volatile Uncorr. ECC | >> | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | >> GPU-Util Compute M. | >> | | | >> MIG M. | >> >> |=========================================+======================+======================| >> | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | >> 0 | >> | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | >> 0% Default | >> | | | >> N/A | >> >> +-----------------------------------------+----------------------+----------------------+ >> | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | >> 0 | >> | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | >> 0% Default | >> | | | >> N/A | >> >> +-----------------------------------------+----------------------+----------------------+ >> | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | >> 0 | >> | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | >> 0% Default | >> | | | >> N/A | >> >> +-----------------------------------------+----------------------+----------------------+ >> | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | >> 0 | >> | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | >> 0% Default | >> | | | >> N/A | >> >> +-----------------------------------------+----------------------+----------------------+ >> >> >> >> +---------------------------------------------------------------------------------------+ >> | Processes: >> | >> | GPU GI CI PID Type Process name >> GPU Memory | >> | ID ID >> Usage | >> >> |=======================================================================================| >> | 0 N/A N/A 214626 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 0 N/A N/A 214627 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214628 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214629 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214630 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 0 N/A N/A 214631 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214632 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214633 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 1 N/A N/A 214627 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 1 N/A N/A 214631 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 2 N/A N/A 214628 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 2 N/A N/A 214632 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 3 N/A N/A 214629 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 3 N/A N/A 214633 C >> ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> >> +---------------------------------------------------------------------------------------+ >> >> >> You can see that GPU 0 is connected to all 8 MPI Processes, each taking >> about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. >> I'm wondering if this is expected or there are some changes I need to do on >> my submission script/runtime parameters. >> This is the script in this case (2 nodes, 8 MPI processes/node, 4 >> GPU/node): >> >> #!/bin/bash >> # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds >> #SBATCH -J test >> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err >> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log >> #SBATCH --partition=gpu >> #SBATCH --ntasks=16 >> #SBATCH --ntasks-per-node=8 >> #SBATCH --cpus-per-task=1 >> #SBATCH --nodes=2 >> #SBATCH --time=01:00:00 >> #SBATCH --gres=gpu:4 >> >> export OMP_NUM_THREADS=1 >> # modules >> module load cuda/11.7 >> module load gcc/11.2.1/toolset >> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 >> >> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc >> >> srun -N 2 -n 16 >> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux >> test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda >> >> Thank you for the advice, >> Marcos >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 22 19:36:05 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 22 Aug 2023 20:36:05 -0400 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: References: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> <6C800943-173D-46A9-890A-A43F0AF1D317@petsc.dev> Message-ID: <47B4D659-9A52-46F5-B441-B72033A2F5EC@petsc.dev> I have added support in https://gitlab.com/petsc/petsc/-/merge_requests/6834 branch barry/2023-08-22/pcredistribute-initial-guess please let me know if it does not work cleanly for you. Thanks Barry > On Aug 21, 2023, at 4:18 PM, Jonas Lundgren wrote: > > Thanks, Barry! > > What solution do you have in mind? I tried a bit myself using SCATTER_FORWARD of the input vector x in PCApply_Redistribute, together with allowing for nonzero initial guess in KSPPREONLY, but that might not be the best solution in a public branch? > > I guess the big gain is due to the fact that the subsequent solvings of state/adjoint problems are done with similar (but not exactly the same) linear operator, so that they become almost the same problem. On the other hand, the state and adjoint problems are not similar to each other, making the solution to one a very bad initial guess to the other. > > Again, thank you for your support. > > Best regards, > Jonas Lundgren > > Fr?n: Barry Smith > > Skickat: den 21 augusti 2023 22:04 > Till: Jonas Lundgren > > Kopia: petsc-users at mcs.anl.gov > ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE > > > Ok, thanks. Definitely more than I expected. > > It is easy to add the support you requested. I'll push a branch latter today. > > Barry > > > > On Aug 21, 2023, at 3:28 PM, Jonas Lundgren > wrote: > > Dear Barry, > > I have tried what you suggested on a (not too large) test example on a bigger cluster that I have access to, using -redistribute_ksp_initial_guess_nonzero 0 and 1, respectively. The average timings during the 5 first (design) iterations are 8.7 s (state) and 7.9 s (adjoint) for the case with zero initial guesses and 5.0 s (state) and 5.7 s (adjoint) for the cases with nonzero initial guesses. These solvings are the bottleneck of my program, accounting for about 60-90% of the total computational time, depending on various parameters. The program is basically consisting of the loop: solve state > solve adjoint > update design > repeat. This is repeated for a couple of hundred iterations. > > From my experience, the number of iterations to convergence in each state/adjoint solve will decrease when increasing the (design) iterative counter (i.e. the longer the process has gone on for) IF the initial guess is the solution to the previous solve. This is because the design update is smaller in the end of the process than in the beginning, and a smaller design update leads to smaller changes in state/adjoint solution between subsequent (design) iterations. This means that the numbers provided above are on the low side: most likely the savings can be even more in the end of the design process. > > Best regards, > Jonas Lundgren > > Fr?n: Barry Smith > > Skickat: den 21 augusti 2023 20:13 > Till: Jonas Lundgren > > Kopia: petsc-users at mcs.anl.gov > ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE > > > When you use 2 KSP so that you can use the previous "solution" as the initial guess for the next problem, how much savings do you get? In iterations inside PCREDISTRIBUTE and in time (relative to the entire linear solver time and relative to the entire application run)? You can get this information running with -log_view > > That is, run the 2 KSP simulation twice, once with the inner KSPSetNonzeroInitialGuess() on and once with it off and compare the times for the two cases. > > Thanks > > Barry > > Using KSPSetNonzeroInitialGuess() requires an extra matrix-vector product and preconditioner application, so I would like to verify that you have a measurable performance improvement with the initial guess. > > > > > > > > > > On Aug 21, 2023, at 7:06 AM, Jonas Lundgren via petsc-users > wrote: > > Dear PETSc users, > > I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). > > First, some details: > - I use a version of PETSc 3.19.1 > - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE:https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ > - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF > - I am first solving a state problem, then an adjoint problem using the same linear operator. > - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps > - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem > - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) > > The main problem is that PCREDISTRIBUTE internally doesn?t use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. > > Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says ?it doesn?t make sense?). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won?t solve the previously mentioned problem, which seems to be the hard nut to crack. > > So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. > > However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. > > So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). > > The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? > > Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? > > Any help is welcome! > > > Best regards, > Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Wed Aug 23 03:35:19 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Wed, 23 Aug 2023 10:35:19 +0200 Subject: [petsc-users] Multiplication of partitioned with non-partitioned (sparse) PETSc matrices Message-ID: Hi all, I am trying to multiply two Petsc matrices as C = A * B, where A is a tall matrix and B is a relatively small matrix. I have taken the decision to create A as (row-)partitioned matrix and B as a non-partitioned matrix that it is entirely shared by all procs (to avoid unnecessary communication). Here is my code: import numpy as np from firedrake import COMM_WORLD from firedrake.petsc import PETSc from numpy.testing import assert_array_almost_equal nproc = COMM_WORLD.size rank = COMM_WORLD.rank def create_petsc_matrix_non_partitioned(input_array): """Building a mpi non-partitioned petsc matrix from an array Args: input_array (np array): Input array sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: mpi mat: PETSc matrix """ assert len(input_array.shape) == 2 m, n = input_array.shape matrix = PETSc.Mat().createAIJ(size=((m, n), (m, n)), comm=COMM_WORLD) # Set the values of the matrix matrix.setValues(range(m), range(n), input_array[:, :], addv=False) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix def create_petsc_matrix(input_array, partition_like=None): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (petsc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: petsc mat: PETSc matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape comm = COMM_WORLD if partition_like is not None: local_rows_start, local_rows_end = partition_like.getOwnershipRange() local_rows = local_rows_end - local_rows_start # No parallelization in the columns, set local_cols = None to parallelize size = ((local_rows, global_rows), (global_cols, global_cols)) else: size = ((None, global_rows), (global_cols, global_cols)) matrix = PETSc.Mat().createAIJ(size=size, comm=comm) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix m, k = 10, 3 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, k)) B_np = np.random.randint(low=0, high=6, size=(k, k)) A = create_petsc_matrix(A_np) B = create_petsc_matrix_non_partitioned(B_np) # Now perform the multiplication C = A * B The problem with this is that there is a mismatch between the local rows of A (depend on the partitioning) and the global rows of B (3 for all procs), so that the multiplication cannot happen in parallel. Here is the error: [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 : system msg for write_line failure : Bad file descriptor Is there a standard way to achieve this? Thanks, Thanos -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at lip6.fr Wed Aug 23 03:47:48 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Wed, 23 Aug 2023 17:47:48 +0900 Subject: [petsc-users] Multiplication of partitioned with non-partitioned (sparse) PETSc matrices Message-ID: ? > On 23 Aug 2023, at 5:35 PM, Thanasis Boutsikakis wrote: > ?Hi all, > > I am trying to multiply two Petsc matrices as C = A * B, where A is a tall matrix and B is a relatively small matrix. > > I have taken the decision to create A as (row-)partitioned matrix and B as a non-partitioned matrix that it is entirely shared by all procs (to avoid unnecessary communication). > > Here is my code: > > import numpy as np > from firedrake import COMM_WORLD > from firedrake.petsc import PETSc > from numpy.testing import assert_array_almost_equal > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > def create_petsc_matrix_non_partitioned(input_array): > """Building a mpi non-partitioned petsc matrix from an array > > Args: > input_array (np array): Input array > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > mpi mat: PETSc matrix > """ > assert len(input_array.shape) == 2 > > m, n = input_array.shape > > matrix = PETSc.Mat().createAIJ(size=((m, n), (m, n)), comm=COMM_WORLD) > > # Set the values of the matrix > matrix.setValues(range(m), range(n), input_array[:, :], addv=False) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > def create_petsc_matrix(input_array, partition_like=None): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (petsc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > petsc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > comm = COMM_WORLD > if partition_like is not None: > local_rows_start, local_rows_end = partition_like.getOwnershipRange() > local_rows = local_rows_end - local_rows_start > > # No parallelization in the columns, set local_cols = None to parallelize > size = ((local_rows, global_rows), (global_cols, global_cols)) > else: > size = ((None, global_rows), (global_cols, global_cols)) > > matrix = PETSc.Mat().createAIJ(size=size, comm=comm) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > m, k = 10, 3 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, k)) > B_np = np.random.randint(low=0, high=6, size=(k, k)) > > > A = create_petsc_matrix(A_np) > > B = create_petsc_matrix_non_partitioned(B_np) > > # Now perform the multiplication > C = A * B > > The problem with this is that there is a mismatch between the local rows of A (depend on the partitioning) and the global rows of B (3 for all procs), so that the multiplication cannot happen in parallel. Here is the error: > > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 > : > system msg for write_line failure : Bad file descriptor > > > Is there a standard way to achieve this? Your B is duplicated by all processes? If so, then, call https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/, do a sequential product with B on COMM_SELF, not COMM_WORLD, and use https://petsc.org/main/manualpages/Mat/MatCreateMPIMatConcatenateSeqMat/ with the output. Thanks, Pierre > Thanks, > Thanos -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Wed Aug 23 04:35:32 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Wed, 23 Aug 2023 11:35:32 +0200 Subject: [petsc-users] Multiplication of partitioned with non-partitioned (sparse) PETSc matrices In-Reply-To: References: Message-ID: Thanks for the suggestion Pierre. Yes B is duplicated by all processes. In this case, should B be created as a sequential sparse matrix using COMM_SELF? I guess if not, the multiplication of B with the output of https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ would not go through, right? Thanks, Thanos > On 23 Aug 2023, at 10:47, Pierre Jolivet wrote: > > ? > >> On 23 Aug 2023, at 5:35 PM, Thanasis Boutsikakis wrote: >> >> ?Hi all, >> >> I am trying to multiply two Petsc matrices as C = A * B, where A is a tall matrix and B is a relatively small matrix. >> >> I have taken the decision to create A as (row-)partitioned matrix and B as a non-partitioned matrix that it is entirely shared by all procs (to avoid unnecessary communication). >> >> Here is my code: >> >> import numpy as np >> from firedrake import COMM_WORLD >> from firedrake.petsc import PETSc >> from numpy.testing import assert_array_almost_equal >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> def create_petsc_matrix_non_partitioned(input_array): >> """Building a mpi non-partitioned petsc matrix from an array >> >> Args: >> input_array (np array): Input array >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> mpi mat: PETSc matrix >> """ >> assert len(input_array.shape) == 2 >> >> m, n = input_array.shape >> >> matrix = PETSc.Mat().createAIJ(size=((m, n), (m, n)), comm=COMM_WORLD) >> >> # Set the values of the matrix >> matrix.setValues(range(m), range(n), input_array[:, :], addv=False) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> >> def create_petsc_matrix(input_array, partition_like=None): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (petsc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> petsc mat: PETSc matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> >> comm = COMM_WORLD >> if partition_like is not None: >> local_rows_start, local_rows_end = partition_like.getOwnershipRange() >> local_rows = local_rows_end - local_rows_start >> >> # No parallelization in the columns, set local_cols = None to parallelize >> size = ((local_rows, global_rows), (global_cols, global_cols)) >> else: >> size = ((None, global_rows), (global_cols, global_cols)) >> >> matrix = PETSc.Mat().createAIJ(size=size, comm=comm) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> >> m, k = 10, 3 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, k)) >> B_np = np.random.randint(low=0, high=6, size=(k, k)) >> >> >> A = create_petsc_matrix(A_np) >> >> B = create_petsc_matrix_non_partitioned(B_np) >> >> # Now perform the multiplication >> C = A * B >> >> The problem with this is that there is a mismatch between the local rows of A (depend on the partitioning) and the global rows of B (3 for all procs), so that the multiplication cannot happen in parallel. Here is the error: >> >> [0]PETSC ERROR: ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >> [0]PETSC ERROR: to get more information on the crash. >> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 >> : >> system msg for write_line failure : Bad file descriptor >> >> >> Is there a standard way to achieve this? > > Your B is duplicated by all processes? > If so, then, call https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/, do a sequential product with B on COMM_SELF, not COMM_WORLD, and use https://petsc.org/main/manualpages/Mat/MatCreateMPIMatConcatenateSeqMat/ with the output. > > Thanks, > Pierre > >> Thanks, >> Thanos >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Aug 23 05:56:41 2023 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 23 Aug 2023 06:56:41 -0400 Subject: [petsc-users] Multiplication of partitioned with non-partitioned (sparse) PETSc matrices In-Reply-To: References: Message-ID: On Wed, Aug 23, 2023 at 5:36?AM Thanasis Boutsikakis < thanasis.boutsikakis at corintis.com> wrote: > Thanks for the suggestion Pierre. > > Yes B is duplicated by all processes. > > In this case, should B be created as a sequential sparse matrix using > COMM_SELF? > Yes, that is what Pierre said, Mark > I guess if not, the multiplication of B with the output of > https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ would not go > through, right? > > Thanks, > Thanos > > > On 23 Aug 2023, at 10:47, Pierre Jolivet wrote: > > ? > > On 23 Aug 2023, at 5:35 PM, Thanasis Boutsikakis < > thanasis.boutsikakis at corintis.com> wrote: > > ?Hi all, > > I am trying to multiply two Petsc matrices as C = A * B, where A is a tall > matrix and B is a relatively small matrix. > > I have taken the decision to create A as (row-)partitioned matrix and B as > a non-partitioned matrix that it is entirely shared by all procs (to avoid > unnecessary communication). > > Here is my code: > > import numpy as np > from firedrake import COMM_WORLD > from firedrake.petsc import PETSc > from numpy.testing import assert_array_almost_equal > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > def create_petsc_matrix_non_partitioned(input_array): > """Building a mpi non-partitioned petsc matrix from an array > > Args: > input_array (np array): Input array > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > mpi mat: PETSc matrix > """ > assert len(input_array.shape) == 2 > > m, n = input_array.shape > > matrix = PETSc.Mat().createAIJ(size=((m, n), (m, n)), comm=COMM_WORLD) > > # Set the values of the matrix > matrix.setValues(range(m), range(n), input_array[:, :], addv=False) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > def create_petsc_matrix(input_array, partition_like=None): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (petsc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > petsc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > comm = COMM_WORLD > if partition_like is not None: > local_rows_start, local_rows_end = partition_like.getOwnershipRange() > local_rows = local_rows_end - local_rows_start > > # No parallelization in the columns, set local_cols = None to parallelize > size = ((local_rows, global_rows), (global_cols, global_cols)) > else: > size = ((None, global_rows), (global_cols, global_cols)) > > matrix = PETSc.Mat().createAIJ(size=size, comm=comm) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > m, k = 10, 3 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, k)) > B_np = np.random.randint(low=0, high=6, size=(k, k)) > > > A = create_petsc_matrix(A_np) > > B = create_petsc_matrix_non_partitioned(B_np) > > # Now perform the multiplication > C = A * B > > The problem with this is that there is a mismatch between the local rows > of A (depend on the partitioning) and the global rows of B (3 for all > procs), so that the multiplication cannot happen in parallel. Here is the > error: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and > https://petsc.org/release/faq/ > [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and > run > [0]PETSC ERROR: to get more information on the crash. > [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is > causing the crash. > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 > : > system msg for write_line failure : Bad file descriptor > > > Is there a standard way to achieve this? > > > Your B is duplicated by all processes? > If so, then, call > https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/, do a > sequential product with B on COMM_SELF, not COMM_WORLD, and use > https://petsc.org/main/manualpages/Mat/MatCreateMPIMatConcatenateSeqMat/ with > the output. > > Thanks, > Pierre > > Thanks, > Thanos > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Wed Aug 23 06:59:07 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Wed, 23 Aug 2023 13:59:07 +0200 Subject: [petsc-users] Multiplication of partitioned with non-partitioned (sparse) PETSc matrices In-Reply-To: References: Message-ID: Thanks for the clarification Mark. I have tried such an implementation but I since I could not find the equivalent of https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ for petsc4py, I used A.getLocalSubMatrix to do so, which returns a ?localref? object that I cannot then use to get my local 'seqaij? matrix. I also tried A.getDenseLocalMatrix() but it seems not to be suitable for my case. I couldn?t find an example in the petsc4py source code or something relevant, do you have any ideas? """Experimenting with petsc mat-mat multiplication""" # import pdb import numpy as np from firedrake import COMM_WORLD from firedrake.petsc import PETSc from numpy.testing import assert_array_almost_equal import pdb nproc = COMM_WORLD.size rank = COMM_WORLD.rank def Print(x: str): """Prints the string only on the root process Args: x (str): String to be printed """ PETSc.Sys.Print(x) def create_petsc_matrix_seq(input_array): """Building a sequential petsc matrix from an array Args: input_array (np array): Input array Returns: seq mat: PETSc matrix """ assert len(input_array.shape) == 2 m, n = input_array.shape matrix = PETSc.Mat().createAIJ(size=(m, n), comm=PETSc.COMM_SELF) matrix.setUp() matrix.setValues(range(m), range(n), input_array, addv=False) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix def create_petsc_matrix(input_array, partition_like=None): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (petsc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: petsc mat: PETSc matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape comm = COMM_WORLD if partition_like is not None: local_rows_start, local_rows_end = partition_like.getOwnershipRange() local_rows = local_rows_end - local_rows_start # No parallelization in the columns, set local_cols = None to parallelize size = ((local_rows, global_rows), (global_cols, global_cols)) else: size = ((None, global_rows), (global_cols, global_cols)) matrix = PETSc.Mat().createAIJ(size=size, comm=comm) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix m, k = 10, 3 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, k)) B_np = np.random.randint(low=0, high=6, size=(k, k)) # Create B as a sequential matrix on each process B_seq = create_petsc_matrix_seq(B_np) Print(B_seq.getType()) Print(B_seq.getSizes()) A = create_petsc_matrix(A_np) print("A type:", A.getType()) print("A sizes:", A.getSizes()) print("A local ownership range:", A.getOwnershipRange()) # pdb.set_trace() # Create a local sequential matrix for A using the local submatrix local_rows_start, local_rows_end = A.getOwnershipRange() local_rows = local_rows_end - local_rows_start print("local_rows_start:", local_rows_start) print("local_rows_end:", local_rows_end) print("local_rows:", local_rows) local_A = PETSc.Mat().createAIJ(size=(local_rows, k), comm=PETSc.COMM_SELF) # pdb.set_trace() comm = A.getComm() rows = PETSc.IS().createStride(local_rows, first=0, step=1, comm=comm) cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm) print("rows indices:", rows.getIndices()) print("cols indices:", cols.getIndices()) # pdb.set_trace() # Create the local to global mapping for rows and columns l2g_rows = PETSc.LGMap().create(rows.getIndices(), comm=comm) l2g_cols = PETSc.LGMap().create(cols.getIndices(), comm=comm) print("l2g_rows type:", type(l2g_rows)) print("l2g_rows:", l2g_rows.view()) print("l2g_rows type:", type(l2g_cols)) print("l2g_cols:", l2g_cols.view()) # pdb.set_trace() # Set the local-to-global mapping for the matrix A.setLGMap(l2g_rows, l2g_cols) # pdb.set_trace() # Now you can get the local submatrix local_A = A.getLocalSubMatrix(rows, cols) # Assembly the matrix to compute the final structure local_A.assemblyBegin() local_A.assemblyEnd() Print(local_A.getType()) Print(local_A.getSizes()) # pdb.set_trace() # Multiply the two matrices local_C = local_A.matMult(B_seq) > On 23 Aug 2023, at 12:56, Mark Adams wrote: > > > > On Wed, Aug 23, 2023 at 5:36?AM Thanasis Boutsikakis > wrote: >> Thanks for the suggestion Pierre. >> >> Yes B is duplicated by all processes. >> >> In this case, should B be created as a sequential sparse matrix using COMM_SELF? > > Yes, that is what Pierre said, > > Mark > >> I guess if not, the multiplication of B with the output of https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ would not go through, right? >> >> Thanks, >> Thanos >> >>> On 23 Aug 2023, at 10:47, Pierre Jolivet > wrote: >>> >>> ? >>> >>>> On 23 Aug 2023, at 5:35 PM, Thanasis Boutsikakis > wrote: >>>> >>>> ?Hi all, >>>> >>>> I am trying to multiply two Petsc matrices as C = A * B, where A is a tall matrix and B is a relatively small matrix. >>>> >>>> I have taken the decision to create A as (row-)partitioned matrix and B as a non-partitioned matrix that it is entirely shared by all procs (to avoid unnecessary communication). >>>> >>>> Here is my code: >>>> >>>> import numpy as np >>>> from firedrake import COMM_WORLD >>>> from firedrake.petsc import PETSc >>>> from numpy.testing import assert_array_almost_equal >>>> >>>> nproc = COMM_WORLD.size >>>> rank = COMM_WORLD.rank >>>> >>>> def create_petsc_matrix_non_partitioned(input_array): >>>> """Building a mpi non-partitioned petsc matrix from an array >>>> >>>> Args: >>>> input_array (np array): Input array >>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>> >>>> Returns: >>>> mpi mat: PETSc matrix >>>> """ >>>> assert len(input_array.shape) == 2 >>>> >>>> m, n = input_array.shape >>>> >>>> matrix = PETSc.Mat().createAIJ(size=((m, n), (m, n)), comm=COMM_WORLD) >>>> >>>> # Set the values of the matrix >>>> matrix.setValues(range(m), range(n), input_array[:, :], addv=False) >>>> >>>> # Assembly the matrix to compute the final structure >>>> matrix.assemblyBegin() >>>> matrix.assemblyEnd() >>>> >>>> return matrix >>>> >>>> >>>> def create_petsc_matrix(input_array, partition_like=None): >>>> """Create a PETSc matrix from an input_array >>>> >>>> Args: >>>> input_array (np array): Input array >>>> partition_like (petsc mat, optional): Petsc matrix. Defaults to None. >>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>> >>>> Returns: >>>> petsc mat: PETSc matrix >>>> """ >>>> # Check if input_array is 1D and reshape if necessary >>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>> global_rows, global_cols = input_array.shape >>>> >>>> comm = COMM_WORLD >>>> if partition_like is not None: >>>> local_rows_start, local_rows_end = partition_like.getOwnershipRange() >>>> local_rows = local_rows_end - local_rows_start >>>> >>>> # No parallelization in the columns, set local_cols = None to parallelize >>>> size = ((local_rows, global_rows), (global_cols, global_cols)) >>>> else: >>>> size = ((None, global_rows), (global_cols, global_cols)) >>>> >>>> matrix = PETSc.Mat().createAIJ(size=size, comm=comm) >>>> matrix.setUp() >>>> >>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>> >>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>> # Calculate the correct row in the array for the current process >>>> row_in_array = counter + local_rows_start >>>> matrix.setValues( >>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>> ) >>>> >>>> # Assembly the matrix to compute the final structure >>>> matrix.assemblyBegin() >>>> matrix.assemblyEnd() >>>> >>>> return matrix >>>> >>>> >>>> m, k = 10, 3 >>>> # Generate the random numpy matrices >>>> np.random.seed(0) # sets the seed to 0 >>>> A_np = np.random.randint(low=0, high=6, size=(m, k)) >>>> B_np = np.random.randint(low=0, high=6, size=(k, k)) >>>> >>>> >>>> A = create_petsc_matrix(A_np) >>>> >>>> B = create_petsc_matrix_non_partitioned(B_np) >>>> >>>> # Now perform the multiplication >>>> C = A * B >>>> >>>> The problem with this is that there is a mismatch between the local rows of A (depend on the partitioning) and the global rows of B (3 for all procs), so that the multiplication cannot happen in parallel. Here is the error: >>>> >>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>> [0]PETSC ERROR: to get more information on the crash. >>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 >>>> : >>>> system msg for write_line failure : Bad file descriptor >>>> >>>> >>>> Is there a standard way to achieve this? >>> >>> Your B is duplicated by all processes? >>> If so, then, call https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/, do a sequential product with B on COMM_SELF, not COMM_WORLD, and use https://petsc.org/main/manualpages/Mat/MatCreateMPIMatConcatenateSeqMat/ with the output. >>> >>> Thanks, >>> Pierre >>> >>>> Thanks, >>>> Thanos >>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at lip6.fr Wed Aug 23 07:40:36 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Wed, 23 Aug 2023 21:40:36 +0900 Subject: [petsc-users] Multiplication of partitioned with non-partitioned (sparse) PETSc matrices In-Reply-To: References: Message-ID: > On 23 Aug 2023, at 8:59 PM, Thanasis Boutsikakis wrote: > > Thanks for the clarification Mark. > > I have tried such an implementation but I since I could not find the equivalent of https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ for petsc4py, I used A.getLocalSubMatrix to do so, which returns a ?localref? object that I cannot then use to get my local 'seqaij? matrix. You really need MatMPIAIJGetLocalMat(). It seems there is a missing petsc4py binding, either add it yourself (and please create a merge request), or post an issue on GitLab and hope that someone adds it. Another way to bypass the issue would be to call MatCreateSubMatrices() with an is_row set to the local rows, and an is_col set to all columns, but this will be less scalable than MatMPIAIJGetLocalMat(). Thanks, Pierre > I also tried A.getDenseLocalMatrix() but it seems not to be suitable for my case. I couldn?t find an example in the petsc4py source code or something relevant, do you have any ideas? > > """Experimenting with petsc mat-mat multiplication""" > # import pdb > > import numpy as np > from firedrake import COMM_WORLD > from firedrake.petsc import PETSc > from numpy.testing import assert_array_almost_equal > import pdb > > nproc = COMM_WORLD.size > rank = COMM_WORLD.rank > > > def Print(x: str): > """Prints the string only on the root process > > Args: > x (str): String to be printed > """ > PETSc.Sys.Print(x) > > > def create_petsc_matrix_seq(input_array): > """Building a sequential petsc matrix from an array > > Args: > input_array (np array): Input array > > Returns: > seq mat: PETSc matrix > """ > assert len(input_array.shape) == 2 > > m, n = input_array.shape > matrix = PETSc.Mat().createAIJ(size=(m, n), comm=PETSc.COMM_SELF) > matrix.setUp() > > matrix.setValues(range(m), range(n), input_array, addv=False) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > def create_petsc_matrix(input_array, partition_like=None): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (petsc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > petsc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > comm = COMM_WORLD > if partition_like is not None: > local_rows_start, local_rows_end = partition_like.getOwnershipRange() > local_rows = local_rows_end - local_rows_start > > # No parallelization in the columns, set local_cols = None to parallelize > size = ((local_rows, global_rows), (global_cols, global_cols)) > else: > size = ((None, global_rows), (global_cols, global_cols)) > > matrix = PETSc.Mat().createAIJ(size=size, comm=comm) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > m, k = 10, 3 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, k)) > B_np = np.random.randint(low=0, high=6, size=(k, k)) > > # Create B as a sequential matrix on each process > B_seq = create_petsc_matrix_seq(B_np) > > Print(B_seq.getType()) > Print(B_seq.getSizes()) > > A = create_petsc_matrix(A_np) > > print("A type:", A.getType()) > print("A sizes:", A.getSizes()) > print("A local ownership range:", A.getOwnershipRange()) > > # pdb.set_trace() > > # Create a local sequential matrix for A using the local submatrix > local_rows_start, local_rows_end = A.getOwnershipRange() > local_rows = local_rows_end - local_rows_start > > print("local_rows_start:", local_rows_start) > print("local_rows_end:", local_rows_end) > print("local_rows:", local_rows) > > local_A = PETSc.Mat().createAIJ(size=(local_rows, k), comm=PETSc.COMM_SELF) > > # pdb.set_trace() > > comm = A.getComm() > rows = PETSc.IS().createStride(local_rows, first=0, step=1, comm=comm) > cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm) > > print("rows indices:", rows.getIndices()) > print("cols indices:", cols.getIndices()) > > # pdb.set_trace() > > # Create the local to global mapping for rows and columns > l2g_rows = PETSc.LGMap().create(rows.getIndices(), comm=comm) > l2g_cols = PETSc.LGMap().create(cols.getIndices(), comm=comm) > > print("l2g_rows type:", type(l2g_rows)) > print("l2g_rows:", l2g_rows.view()) > print("l2g_rows type:", type(l2g_cols)) > print("l2g_cols:", l2g_cols.view()) > > # pdb.set_trace() > > # Set the local-to-global mapping for the matrix > A.setLGMap(l2g_rows, l2g_cols) > > # pdb.set_trace() > > # Now you can get the local submatrix > local_A = A.getLocalSubMatrix(rows, cols) > > # Assembly the matrix to compute the final structure > local_A.assemblyBegin() > local_A.assemblyEnd() > > Print(local_A.getType()) > Print(local_A.getSizes()) > > # pdb.set_trace() > > # Multiply the two matrices > local_C = local_A.matMult(B_seq) > > >> On 23 Aug 2023, at 12:56, Mark Adams wrote: >> >> >> >> On Wed, Aug 23, 2023 at 5:36?AM Thanasis Boutsikakis > wrote: >>> Thanks for the suggestion Pierre. >>> >>> Yes B is duplicated by all processes. >>> >>> In this case, should B be created as a sequential sparse matrix using COMM_SELF? >> >> Yes, that is what Pierre said, >> >> Mark >> >>> I guess if not, the multiplication of B with the output of https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ would not go through, right? >>> >>> Thanks, >>> Thanos >>> >>>> On 23 Aug 2023, at 10:47, Pierre Jolivet > wrote: >>>> >>>> ? >>>> >>>>> On 23 Aug 2023, at 5:35 PM, Thanasis Boutsikakis > wrote: >>>>> >>>>> ?Hi all, >>>>> >>>>> I am trying to multiply two Petsc matrices as C = A * B, where A is a tall matrix and B is a relatively small matrix. >>>>> >>>>> I have taken the decision to create A as (row-)partitioned matrix and B as a non-partitioned matrix that it is entirely shared by all procs (to avoid unnecessary communication). >>>>> >>>>> Here is my code: >>>>> >>>>> import numpy as np >>>>> from firedrake import COMM_WORLD >>>>> from firedrake.petsc import PETSc >>>>> from numpy.testing import assert_array_almost_equal >>>>> >>>>> nproc = COMM_WORLD.size >>>>> rank = COMM_WORLD.rank >>>>> >>>>> def create_petsc_matrix_non_partitioned(input_array): >>>>> """Building a mpi non-partitioned petsc matrix from an array >>>>> >>>>> Args: >>>>> input_array (np array): Input array >>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>> >>>>> Returns: >>>>> mpi mat: PETSc matrix >>>>> """ >>>>> assert len(input_array.shape) == 2 >>>>> >>>>> m, n = input_array.shape >>>>> >>>>> matrix = PETSc.Mat().createAIJ(size=((m, n), (m, n)), comm=COMM_WORLD) >>>>> >>>>> # Set the values of the matrix >>>>> matrix.setValues(range(m), range(n), input_array[:, :], addv=False) >>>>> >>>>> # Assembly the matrix to compute the final structure >>>>> matrix.assemblyBegin() >>>>> matrix.assemblyEnd() >>>>> >>>>> return matrix >>>>> >>>>> >>>>> def create_petsc_matrix(input_array, partition_like=None): >>>>> """Create a PETSc matrix from an input_array >>>>> >>>>> Args: >>>>> input_array (np array): Input array >>>>> partition_like (petsc mat, optional): Petsc matrix. Defaults to None. >>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>> >>>>> Returns: >>>>> petsc mat: PETSc matrix >>>>> """ >>>>> # Check if input_array is 1D and reshape if necessary >>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>> global_rows, global_cols = input_array.shape >>>>> >>>>> comm = COMM_WORLD >>>>> if partition_like is not None: >>>>> local_rows_start, local_rows_end = partition_like.getOwnershipRange() >>>>> local_rows = local_rows_end - local_rows_start >>>>> >>>>> # No parallelization in the columns, set local_cols = None to parallelize >>>>> size = ((local_rows, global_rows), (global_cols, global_cols)) >>>>> else: >>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>> >>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=comm) >>>>> matrix.setUp() >>>>> >>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>> >>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>> # Calculate the correct row in the array for the current process >>>>> row_in_array = counter + local_rows_start >>>>> matrix.setValues( >>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>> ) >>>>> >>>>> # Assembly the matrix to compute the final structure >>>>> matrix.assemblyBegin() >>>>> matrix.assemblyEnd() >>>>> >>>>> return matrix >>>>> >>>>> >>>>> m, k = 10, 3 >>>>> # Generate the random numpy matrices >>>>> np.random.seed(0) # sets the seed to 0 >>>>> A_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>> B_np = np.random.randint(low=0, high=6, size=(k, k)) >>>>> >>>>> >>>>> A = create_petsc_matrix(A_np) >>>>> >>>>> B = create_petsc_matrix_non_partitioned(B_np) >>>>> >>>>> # Now perform the multiplication >>>>> C = A * B >>>>> >>>>> The problem with this is that there is a mismatch between the local rows of A (depend on the partitioning) and the global rows of B (3 for all procs), so that the multiplication cannot happen in parallel. Here is the error: >>>>> >>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>> [0]PETSC ERROR: to get more information on the crash. >>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 >>>>> : >>>>> system msg for write_line failure : Bad file descriptor >>>>> >>>>> >>>>> Is there a standard way to achieve this? >>>> >>>> Your B is duplicated by all processes? >>>> If so, then, call https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/, do a sequential product with B on COMM_SELF, not COMM_WORLD, and use https://petsc.org/main/manualpages/Mat/MatCreateMPIMatConcatenateSeqMat/ with the output. >>>> >>>> Thanks, >>>> Pierre >>>> >>>>> Thanks, >>>>> Thanos >>>>> >>>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From laurent.trotignon at cea.fr Wed Aug 23 04:15:43 2023 From: laurent.trotignon at cea.fr (TROTIGNON Laurent) Date: Wed, 23 Aug 2023 09:15:43 +0000 Subject: [petsc-users] Runtime options to activate GPU offloading of PETSc solvers Message-ID: <991c498fda0846e6aaf41d0e2fcf48e3@cea.fr> Hello all, In the on line docs of PETSc, I found this paragraph : < PETSc uses a single source programming model where solver back-ends are selected as runtime options and configuration options with no changes to the API. Users should (ideally) never have to change their source code to take advantage of new backend implementations. > I am looking for an example of runtime options that enable activation of GPU offloading of PETSc solvers. I am not sure that runtime options for GPU offloading are currently available ? I am currently using petsc/3.19.2 configured with cuda and nvhpc/22.2. Best regards Laurent -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaibhav_b at ce.iitr.ac.in Wed Aug 23 06:41:51 2023 From: vaibhav_b at ce.iitr.ac.in (VAIBHAV BHANDARI) Date: Wed, 23 Aug 2023 17:11:51 +0530 Subject: [petsc-users] REQUESTING INVITATON FOR SLACK WORKSPACE OF PETSC Message-ID: Dear Sir/Mam, I hope this email finds you well. I am writing to request an invitation to join the PETSc Slack Workspace. As a passionate enthusiast of parallel and scientific computing, I have been following the development and advancements of PETSc with great interest. I have been actively involved in *Topology Optimization* and believe that being a part of the PETSc Slack community would provide me with an avenue to share insights, seek advice, and learn from the experiences of fellow members. If possible, I kindly request an invitation to join the PETSc Slack Workspace. I assure you that I will adhere to all community guidelines and contribute positively to the discussions. I am excited about the prospect of connecting with experts and enthusiasts in the field and contributing to the mutual growth and understanding within the PETSc community. Thank you for considering my request. I eagerly await the opportunity to engage with the PETSc community on Slack. Please feel free to reach out to me at [vaibhav_b at ce.iitr.ac.in] if you require any further information. Looking forward to your positive response. Best regards, Vaibhav Bhandari Ph.D. STudent IIT Roorkee -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Aug 23 09:30:27 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 23 Aug 2023 09:30:27 -0500 (CDT) Subject: [petsc-users] [petsc-maint] REQUESTING INVITATON FOR SLACK WORKSPACE OF PETSC In-Reply-To: References: Message-ID: <91ce19b7-f2d5-a87f-2ad4-3a8556f28959@mcs.anl.gov> Check: https://lists.mcs.anl.gov/pipermail/petsc-users/2023-July/049115.html Also - best to not cross post to multiple lists. Satish On Wed, 23 Aug 2023, VAIBHAV BHANDARI wrote: > Dear Sir/Mam, > > I hope this email finds you well. I am writing to request an invitation to > join the PETSc Slack Workspace. As a passionate enthusiast of parallel and > scientific computing, I have been following the development and > advancements of PETSc with great interest. > I have been actively involved in *Topology Optimization* and believe that > being a part of the PETSc Slack community would provide me with an avenue > to share insights, seek advice, and learn from the experiences of fellow > members. > If possible, I kindly request an invitation to join the PETSc Slack > Workspace. I assure you that I will adhere to all community guidelines and > contribute positively to the discussions. I am excited about the prospect > of connecting with experts and enthusiasts in the field and contributing to > the mutual growth and understanding within the PETSc community. > > Thank you for considering my request. I eagerly await the opportunity to > engage with the PETSc community on Slack. Please feel free to reach out to > me at [vaibhav_b at ce.iitr.ac.in] if you require any further information. > > Looking forward to your positive response. > > Best regards, > Vaibhav Bhandari > Ph.D. STudent > IIT Roorkee > From junchao.zhang at gmail.com Wed Aug 23 10:00:28 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 23 Aug 2023 10:00:28 -0500 Subject: [petsc-users] Runtime options to activate GPU offloading of PETSc solvers In-Reply-To: <991c498fda0846e6aaf41d0e2fcf48e3@cea.fr> References: <991c498fda0846e6aaf41d0e2fcf48e3@cea.fr> Message-ID: For example, src/ksp/ksp/tutorials/ex1.c, which can be tested with these options. "-mat_type aijcusparse -vec_type cuda" enables GPU offloading. test: suffix: 2_aijcusparse requires: cuda args: -pc_type sor -pc_sor_symmetric -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always -mat_type aijcusparse -vec_type cuda args: -ksp_view --Junchao Zhang On Wed, Aug 23, 2023 at 9:27?AM TROTIGNON Laurent wrote: > Hello all, > > In the on line docs of PETSc, I found this paragraph : > > > > ? PETSc uses a single source programming model where solver back-ends are > selected as *runtime* options and configuration options with no changes > to the API. > > Users should (ideally) never have to change their source code to take > advantage of new backend implementations. ? > > > > I am looking for an example of runtime options that enable activation of > GPU offloading of PETSc solvers. > > I am not sure that runtime options for GPU offloading are currently > available ? > > I am currently using petsc/3.19.2 configured with cuda and nvhpc/22.2. > > Best regards > > Laurent > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 23 10:07:17 2023 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 23 Aug 2023 10:07:17 -0500 Subject: [petsc-users] Runtime options to activate GPU offloading of PETSc solvers In-Reply-To: <991c498fda0846e6aaf41d0e2fcf48e3@cea.fr> References: <991c498fda0846e6aaf41d0e2fcf48e3@cea.fr> Message-ID: On Wed, Aug 23, 2023 at 9:27?AM TROTIGNON Laurent wrote: > Hello all, > > In the on line docs of PETSc, I found this paragraph : > > > > ? PETSc uses a single source programming model where solver back-ends are > selected as *runtime* options and configuration options with no changes > to the API. > > Users should (ideally) never have to change their source code to take > advantage of new backend implementations. ? > > > > I am looking for an example of runtime options that enable activation of > GPU offloading of PETSc solvers. > > I am not sure that runtime options for GPU offloading are currently > available ? > > I am currently using petsc/3.19.2 configured with cuda and nvhpc/22.2. > > Here is the FAQ on this: https://petsc.org/main/faq/#doc-faq-gpuhowto The short answer is that GPU offloading is enabled by changing the types of vectors and matrices, so that all the operations are executed on the GPU. The solvers just organize those operations, so they need no changes. Thanks, Matt > Best regards > > Laurent > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonas.lundgren at liu.se Wed Aug 23 10:11:38 2023 From: jonas.lundgren at liu.se (Jonas Lundgren) Date: Wed, 23 Aug 2023 15:11:38 +0000 Subject: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE In-Reply-To: <47B4D659-9A52-46F5-B441-B72033A2F5EC@petsc.dev> References: <31167305-9A93-41E0-B349-8A9D07F72D3A@petsc.dev> <6C800943-173D-46A9-890A-A43F0AF1D317@petsc.dev> <47B4D659-9A52-46F5-B441-B72033A2F5EC@petsc.dev> Message-ID: Hi Barry, It works like a charm! I spotted a typo in preonly.c, row 102: -redistribute_ksp_set_initial_guess_nonzero "_set" should be removed, right? Thank you for all your support, Jonas Lundgren Fr?n: Barry Smith Skickat: den 23 augusti 2023 02:36 Till: Jonas Lundgren Kopia: petsc-users at mcs.anl.gov ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE I have added support in https://gitlab.com/petsc/petsc/-/merge_requests/6834 branch barry/2023-08-22/pcredistribute-initial-guess please let me know if it does not work cleanly for you. Thanks Barry On Aug 21, 2023, at 4:18 PM, Jonas Lundgren > wrote: Thanks, Barry! What solution do you have in mind? I tried a bit myself using SCATTER_FORWARD of the input vector x in PCApply_Redistribute, together with allowing for nonzero initial guess in KSPPREONLY, but that might not be the best solution in a public branch? I guess the big gain is due to the fact that the subsequent solvings of state/adjoint problems are done with similar (but not exactly the same) linear operator, so that they become almost the same problem. On the other hand, the state and adjoint problems are not similar to each other, making the solution to one a very bad initial guess to the other. Again, thank you for your support. Best regards, Jonas Lundgren Fr?n: Barry Smith > Skickat: den 21 augusti 2023 22:04 Till: Jonas Lundgren > Kopia: petsc-users at mcs.anl.gov ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE Ok, thanks. Definitely more than I expected. It is easy to add the support you requested. I'll push a branch latter today. Barry On Aug 21, 2023, at 3:28 PM, Jonas Lundgren > wrote: Dear Barry, I have tried what you suggested on a (not too large) test example on a bigger cluster that I have access to, using -redistribute_ksp_initial_guess_nonzero 0 and 1, respectively. The average timings during the 5 first (design) iterations are 8.7 s (state) and 7.9 s (adjoint) for the case with zero initial guesses and 5.0 s (state) and 5.7 s (adjoint) for the cases with nonzero initial guesses. These solvings are the bottleneck of my program, accounting for about 60-90% of the total computational time, depending on various parameters. The program is basically consisting of the loop: solve state > solve adjoint > update design > repeat. This is repeated for a couple of hundred iterations. >From my experience, the number of iterations to convergence in each state/adjoint solve will decrease when increasing the (design) iterative counter (i.e. the longer the process has gone on for) IF the initial guess is the solution to the previous solve. This is because the design update is smaller in the end of the process than in the beginning, and a smaller design update leads to smaller changes in state/adjoint solution between subsequent (design) iterations. This means that the numbers provided above are on the low side: most likely the savings can be even more in the end of the design process. Best regards, Jonas Lundgren Fr?n: Barry Smith > Skickat: den 21 augusti 2023 20:13 Till: Jonas Lundgren > Kopia: petsc-users at mcs.anl.gov ?mne: Re: [petsc-users] (Sub) KSP initial guess with PCREDISTRIBUTE When you use 2 KSP so that you can use the previous "solution" as the initial guess for the next problem, how much savings do you get? In iterations inside PCREDISTRIBUTE and in time (relative to the entire linear solver time and relative to the entire application run)? You can get this information running with -log_view That is, run the 2 KSP simulation twice, once with the inner KSPSetNonzeroInitialGuess() on and once with it off and compare the times for the two cases. Thanks Barry Using KSPSetNonzeroInitialGuess() requires an extra matrix-vector product and preconditioner application, so I would like to verify that you have a measurable performance improvement with the initial guess. On Aug 21, 2023, at 7:06 AM, Jonas Lundgren via petsc-users > wrote: Dear PETSc users, I have a problem regarding the setting of initial guess to KSP when using PCREDISTRIBUTE as the preconditioner. (The reason to why I use PCREDISTRIBUTE is because I have a lot of fixed DOF in my problem, and PCREDISTRIBUTE successfully reduces the problem size and therefore speeds up the solving). First, some details: - I use a version of PETSc 3.19.1 - The KSP I use is KSPPREONLY, as suggested in the manual pages of PCREDISTRIBUTE:https://petsc.org/release/manualpages/PC/PCREDISTRIBUTE/ - I use KSPBCGSL as sub-KSP. I can perfectly well solve my problem using this as my main KSP, but the performance is much worse than when using it as my sub-KSP (under KSPPREONLY+PCREDISTRIBUTE) due to the amount of fixed DOF - I am first solving a state problem, then an adjoint problem using the same linear operator. - The adjoint vector is used as sensitivity information to update a design. After the design update, the state+adjoint problems are solved again with a slightly updated linear operator. This is done for hundreds of (design) iteration steps - I want the initial guess for the state problem to be the state solution from the previous (design) iteration, and same for the adjoint problem - I am aware of the default way of setting a custom initial guess: KSPSetInitialGuessNonzero(ksp, PETSC_TRUE) together with providing the actual guess in the x vector in the call to KSPSolve(ksp, b, x) The main problem is that PCREDISTRIBUTE internally doesn't use the input solution vector (x) when calling KSPSolve() for the sub-KSP. It zeroes out the solution vector (x) when starting to build x = diag(A)^{-1} b in the beginning of PCApply_Redistribute(), and uses red->x as the solution vector/initial guess when calling KSPSolve(). Therefore, I cannot reach the sub-KSP with an initial guess. Additionally, KSPPREONLY prohibits the use of having a nonzero initial guess (the error message says "it doesn't make sense"). I guess I can remove the line raising this error and recompile the PETSc libraries, but it still won't solve the previously mentioned problem, which seems to be the hard nut to crack. So far, I have found out that if I create 2 KSP object, one each for the state and adjoint problems, it is enough with calling KSPSetInitialGuessNonzero(subksp, PETSC_TRUE) on the subksp. It seems as if the variable red->x in PCApply_Redistribute() is kept untouched in memory between calls to the main KSP and therefore is used as (non-zero) initial guess to the sub-KSP. This has been verified by introducing PetscCall(PetscObjectCompose((PetscObject)pc,"redx",(PetscObject)red->x)); in PCApply_Redistribute(), recompiling the PETSc library, and then inserting a corresponding PetscObjectQuery((PetscObject)pc, "redx", (PetscObject *)&redx); in my own program source file. However, I would like to only create 1 KSP to be used with both the state and adjoint problems (same linear operator), for memory reasons. When I do this, the initial guesses are mixed up between the two problems: the initial guess for the adjoint problem is the solution to the state problem in the current design iteration, and the initial guess for the state problem is the solution to the adjoint problem in the previous design iteration. These are very bad guesses that increases the time to convergence in each state/adjoint solve. So, the core of the problem (as far as I can understand) is that I want to control the initial guess red->x in PCApply_Redistribute(). The only solution I can think of is to include a call to PetscObjectQuery() in PCApply_Redistribute() to obtain a vector with the initial guess from my main program. And then I need to keep track of the initial guesses for my both problems in my main program myself (minor problem). This is maybe not the neatest way, and I do not know if this approach affects the performance negatively? Maybe one call each to PetscObjectQuery() and PetscObjectCompose() per call to PCApply_Redistribute() is negligible? Is there another (and maybe simpler) solution to this problem? Maybe I can SCATTER_FORWARD the input x vector in PCApply_Redistribute() before it is zeroed out, together with allowing for non-zero initial guess in KSPPREONLY? Any help is welcome! Best regards, Jonas Lundgren -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Thu Aug 24 04:58:05 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Thu, 24 Aug 2023 11:58:05 +0200 Subject: [petsc-users] Multiplication of partitioned with non-partitioned (sparse) PETSc matrices In-Reply-To: References: Message-ID: <27C19714-518A-4F7E-9B47-EA0491460766@corintis.com> Thanks a lot Pierre! I managed to solve my problem for now using the (less scalable) solution that you provided. I also opened up an issue about the missing MatMPIAIJGetLocalMat() petsc4py binding here https://gitlab.com/petsc/petsc/-/issues/1443 and if no action is taken, I might take care of it myself as well, soon. For the sake of completeness, and to help any potential PETSc user that might run into the same issue, here is my final code that works nicely in parallel. """Experimenting with PETSc mat-mat multiplication""" import numpy as np from firedrake import COMM_WORLD from firedrake.petsc import PETSc from numpy.testing import assert_array_almost_equal # import pdb nproc = COMM_WORLD.size rank = COMM_WORLD.rank def Print(x: str): """Prints the string only on the root process Args: x (str): String to be printed """ PETSc.Sys.Print(x) def print_mat_info(mat, name): """Prints the matrix information Args: mat (PETSc mat): PETSc matrix name (string): Name of the matrix """ Print(f"MATRIX {name} [{mat.getSize()[0]}x{mat.getSize()[1]}]") # print(f"For rank {rank} local {name}: {mat.getSizes()}") Print(mat.getType()) mat.view() Print("") COMM_WORLD.Barrier() Print("") def create_petsc_matrix_seq(input_array): """Building a sequential PETSc matrix from an array Args: input_array (np array): Input array Returns: seq mat: PETSc matrix """ assert len(input_array.shape) == 2 m, n = input_array.shape matrix = PETSc.Mat().createAIJ(size=(m, n), comm=PETSc.COMM_SELF) matrix.setUp() matrix.setValues(range(m), range(n), input_array, addv=False) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix def create_petsc_matrix(input_array, sparse=True): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: PETSc mat: PETSc matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape size = ((None, global_rows), (global_cols, global_cols)) # Create a sparse or dense matrix based on the 'sparse' argument if sparse: matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) else: matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix def get_local_submatrix(A): """Get the local submatrix of A Args: A (mpi PETSc mat): partitioned PETSc matrix Returns: seq mat: PETSc matrix """ local_rows_start, local_rows_end = A.getOwnershipRange() local_rows = local_rows_end - local_rows_start comm = A.getComm() rows = PETSc.IS().createStride( local_rows, first=local_rows_start, step=1, comm=comm ) _, k = A.getSize() # Get the number of columns (k) from A's size cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm) # Getting the local submatrix # TODO: To be replaced by MatMPIAIJGetLocalMat() in the future (see petsc-users mailing list). There is a missing petsc4py binding, need to add it myself (and please create a merge request) A_local = A.createSubMatrices(rows, cols)[0] return A_local def multiply_sequential_matrices(A_seq, B_seq): """Multiply 2 sequential matrices Args: A_seq (seqaij): local submatrix of A B_seq (seqaij): sequential matrix B Returns: seq mat: PETSc matrix that is the product of A_seq and B_seq """ _, A_seq_cols = A_seq.getSize() B_seq_rows, _ = B_seq.getSize() assert ( A_seq_cols == B_seq_rows ), f"Incompatible matrix sizes for multiplication: {A_seq_cols} != {B_seq_rows}" C_local = A_seq.matMult(B_seq) return C_local def create_global_matrix(C_local, A): """Create the global matrix C from the local submatrix C_local Args: C_local (seqaij): local submatrix of C A (mpi PETSc mat): PETSc matrix A Returns: mpi PETSc mat: partitioned PETSc matrix C """ C_local_rows, C_local_cols = C_local.getSize() local_rows_start, _ = A.getOwnershipRange() m, _ = A.getSize() C = PETSc.Mat().createAIJ( size=((None, m), (C_local_cols, C_local_cols)), comm=COMM_WORLD ) C.setUp() for i in range(C_local_rows): cols, values = C_local.getRow(i) global_row = i + local_rows_start C.setValues(global_row, cols, values) C.assemblyBegin() C.assemblyEnd() return C m, k = 11, 7 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, k)) B_np = np.random.randint(low=0, high=6, size=(k, k)) # Create B as a sequential matrix on each process B_seq = create_petsc_matrix_seq(B_np) print_mat_info(B_seq, "B") A = create_petsc_matrix(A_np) print_mat_info(A, "A") # Getting the correct local submatrix to be multiplied by B_seq A_local = get_local_submatrix(A) # Multiplication of 2 sequential matrices C_local = multiply_sequential_matrices(A_local, B_seq) # Creating the global C matrix C = create_global_matrix(C_local, A) print_mat_info(C, "C") # -------------------------------------------- # TEST: Multiplication of 2 numpy matrices # -------------------------------------------- AB_np = np.dot(A_np, B_np) Print(f"MATRIX AB_np [{AB_np.shape[0]}x{AB_np.shape[1]}]") Print(AB_np) # Get the local values from C local_rows_start, local_rows_end = C.getOwnershipRange() C_local = C.getValues(range(local_rows_start, local_rows_end), range(k)) # Assert the correctness of the multiplication for the local subset assert_array_almost_equal(C_local, AB_np[local_rows_start:local_rows_end, :], decimal=5) > On 23 Aug 2023, at 14:40, Pierre Jolivet wrote: > > > >> On 23 Aug 2023, at 8:59 PM, Thanasis Boutsikakis wrote: >> >> Thanks for the clarification Mark. >> >> I have tried such an implementation but I since I could not find the equivalent of https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ for petsc4py, I used A.getLocalSubMatrix to do so, which returns a ?localref? object that I cannot then use to get my local 'seqaij? matrix. > > You really need MatMPIAIJGetLocalMat(). > It seems there is a missing petsc4py binding, either add it yourself (and please create a merge request), or post an issue on GitLab and hope that someone adds it. > Another way to bypass the issue would be to call MatCreateSubMatrices() with an is_row set to the local rows, and an is_col set to all columns, but this will be less scalable than MatMPIAIJGetLocalMat(). > > Thanks, > Pierre > >> I also tried A.getDenseLocalMatrix() but it seems not to be suitable for my case. I couldn?t find an example in the petsc4py source code or something relevant, do you have any ideas? >> >> """Experimenting with petsc mat-mat multiplication""" >> # import pdb >> >> import numpy as np >> from firedrake import COMM_WORLD >> from firedrake.petsc import PETSc >> from numpy.testing import assert_array_almost_equal >> import pdb >> >> nproc = COMM_WORLD.size >> rank = COMM_WORLD.rank >> >> >> def Print(x: str): >> """Prints the string only on the root process >> >> Args: >> x (str): String to be printed >> """ >> PETSc.Sys.Print(x) >> >> >> def create_petsc_matrix_seq(input_array): >> """Building a sequential petsc matrix from an array >> >> Args: >> input_array (np array): Input array >> >> Returns: >> seq mat: PETSc matrix >> """ >> assert len(input_array.shape) == 2 >> >> m, n = input_array.shape >> matrix = PETSc.Mat().createAIJ(size=(m, n), comm=PETSc.COMM_SELF) >> matrix.setUp() >> >> matrix.setValues(range(m), range(n), input_array, addv=False) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> >> def create_petsc_matrix(input_array, partition_like=None): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (petsc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> petsc mat: PETSc matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> >> comm = COMM_WORLD >> if partition_like is not None: >> local_rows_start, local_rows_end = partition_like.getOwnershipRange() >> local_rows = local_rows_end - local_rows_start >> >> # No parallelization in the columns, set local_cols = None to parallelize >> size = ((local_rows, global_rows), (global_cols, global_cols)) >> else: >> size = ((None, global_rows), (global_cols, global_cols)) >> >> matrix = PETSc.Mat().createAIJ(size=size, comm=comm) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> >> m, k = 10, 3 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, k)) >> B_np = np.random.randint(low=0, high=6, size=(k, k)) >> >> # Create B as a sequential matrix on each process >> B_seq = create_petsc_matrix_seq(B_np) >> >> Print(B_seq.getType()) >> Print(B_seq.getSizes()) >> >> A = create_petsc_matrix(A_np) >> >> print("A type:", A.getType()) >> print("A sizes:", A.getSizes()) >> print("A local ownership range:", A.getOwnershipRange()) >> >> # pdb.set_trace() >> >> # Create a local sequential matrix for A using the local submatrix >> local_rows_start, local_rows_end = A.getOwnershipRange() >> local_rows = local_rows_end - local_rows_start >> >> print("local_rows_start:", local_rows_start) >> print("local_rows_end:", local_rows_end) >> print("local_rows:", local_rows) >> >> local_A = PETSc.Mat().createAIJ(size=(local_rows, k), comm=PETSc.COMM_SELF) >> >> # pdb.set_trace() >> >> comm = A.getComm() >> rows = PETSc.IS().createStride(local_rows, first=0, step=1, comm=comm) >> cols = PETSc.IS().createStride(k, first=0, step=1, comm=comm) >> >> print("rows indices:", rows.getIndices()) >> print("cols indices:", cols.getIndices()) >> >> # pdb.set_trace() >> >> # Create the local to global mapping for rows and columns >> l2g_rows = PETSc.LGMap().create(rows.getIndices(), comm=comm) >> l2g_cols = PETSc.LGMap().create(cols.getIndices(), comm=comm) >> >> print("l2g_rows type:", type(l2g_rows)) >> print("l2g_rows:", l2g_rows.view()) >> print("l2g_rows type:", type(l2g_cols)) >> print("l2g_cols:", l2g_cols.view()) >> >> # pdb.set_trace() >> >> # Set the local-to-global mapping for the matrix >> A.setLGMap(l2g_rows, l2g_cols) >> >> # pdb.set_trace() >> >> # Now you can get the local submatrix >> local_A = A.getLocalSubMatrix(rows, cols) >> >> # Assembly the matrix to compute the final structure >> local_A.assemblyBegin() >> local_A.assemblyEnd() >> >> Print(local_A.getType()) >> Print(local_A.getSizes()) >> >> # pdb.set_trace() >> >> # Multiply the two matrices >> local_C = local_A.matMult(B_seq) >> >> >>> On 23 Aug 2023, at 12:56, Mark Adams wrote: >>> >>> >>> >>> On Wed, Aug 23, 2023 at 5:36?AM Thanasis Boutsikakis > wrote: >>>> Thanks for the suggestion Pierre. >>>> >>>> Yes B is duplicated by all processes. >>>> >>>> In this case, should B be created as a sequential sparse matrix using COMM_SELF? >>> >>> Yes, that is what Pierre said, >>> >>> Mark >>> >>>> I guess if not, the multiplication of B with the output of https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/ would not go through, right? >>>> >>>> Thanks, >>>> Thanos >>>> >>>>> On 23 Aug 2023, at 10:47, Pierre Jolivet > wrote: >>>>> >>>>> ? >>>>> >>>>>> On 23 Aug 2023, at 5:35 PM, Thanasis Boutsikakis > wrote: >>>>>> >>>>>> ?Hi all, >>>>>> >>>>>> I am trying to multiply two Petsc matrices as C = A * B, where A is a tall matrix and B is a relatively small matrix. >>>>>> >>>>>> I have taken the decision to create A as (row-)partitioned matrix and B as a non-partitioned matrix that it is entirely shared by all procs (to avoid unnecessary communication). >>>>>> >>>>>> Here is my code: >>>>>> >>>>>> import numpy as np >>>>>> from firedrake import COMM_WORLD >>>>>> from firedrake.petsc import PETSc >>>>>> from numpy.testing import assert_array_almost_equal >>>>>> >>>>>> nproc = COMM_WORLD.size >>>>>> rank = COMM_WORLD.rank >>>>>> >>>>>> def create_petsc_matrix_non_partitioned(input_array): >>>>>> """Building a mpi non-partitioned petsc matrix from an array >>>>>> >>>>>> Args: >>>>>> input_array (np array): Input array >>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>> >>>>>> Returns: >>>>>> mpi mat: PETSc matrix >>>>>> """ >>>>>> assert len(input_array.shape) == 2 >>>>>> >>>>>> m, n = input_array.shape >>>>>> >>>>>> matrix = PETSc.Mat().createAIJ(size=((m, n), (m, n)), comm=COMM_WORLD) >>>>>> >>>>>> # Set the values of the matrix >>>>>> matrix.setValues(range(m), range(n), input_array[:, :], addv=False) >>>>>> >>>>>> # Assembly the matrix to compute the final structure >>>>>> matrix.assemblyBegin() >>>>>> matrix.assemblyEnd() >>>>>> >>>>>> return matrix >>>>>> >>>>>> >>>>>> def create_petsc_matrix(input_array, partition_like=None): >>>>>> """Create a PETSc matrix from an input_array >>>>>> >>>>>> Args: >>>>>> input_array (np array): Input array >>>>>> partition_like (petsc mat, optional): Petsc matrix. Defaults to None. >>>>>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>>>>> >>>>>> Returns: >>>>>> petsc mat: PETSc matrix >>>>>> """ >>>>>> # Check if input_array is 1D and reshape if necessary >>>>>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>>>>> global_rows, global_cols = input_array.shape >>>>>> >>>>>> comm = COMM_WORLD >>>>>> if partition_like is not None: >>>>>> local_rows_start, local_rows_end = partition_like.getOwnershipRange() >>>>>> local_rows = local_rows_end - local_rows_start >>>>>> >>>>>> # No parallelization in the columns, set local_cols = None to parallelize >>>>>> size = ((local_rows, global_rows), (global_cols, global_cols)) >>>>>> else: >>>>>> size = ((None, global_rows), (global_cols, global_cols)) >>>>>> >>>>>> matrix = PETSc.Mat().createAIJ(size=size, comm=comm) >>>>>> matrix.setUp() >>>>>> >>>>>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>>>>> >>>>>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>>>>> # Calculate the correct row in the array for the current process >>>>>> row_in_array = counter + local_rows_start >>>>>> matrix.setValues( >>>>>> i, range(global_cols), input_array[row_in_array, :], addv=False >>>>>> ) >>>>>> >>>>>> # Assembly the matrix to compute the final structure >>>>>> matrix.assemblyBegin() >>>>>> matrix.assemblyEnd() >>>>>> >>>>>> return matrix >>>>>> >>>>>> >>>>>> m, k = 10, 3 >>>>>> # Generate the random numpy matrices >>>>>> np.random.seed(0) # sets the seed to 0 >>>>>> A_np = np.random.randint(low=0, high=6, size=(m, k)) >>>>>> B_np = np.random.randint(low=0, high=6, size=(k, k)) >>>>>> >>>>>> >>>>>> A = create_petsc_matrix(A_np) >>>>>> >>>>>> B = create_petsc_matrix_non_partitioned(B_np) >>>>>> >>>>>> # Now perform the multiplication >>>>>> C = A * B >>>>>> >>>>>> The problem with this is that there is a mismatch between the local rows of A (depend on the partitioning) and the global rows of B (3 for all procs), so that the multiplication cannot happen in parallel. Here is the error: >>>>>> >>>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range >>>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>> [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ >>>>>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run >>>>>> [0]PETSC ERROR: to get more information on the crash. >>>>>> [0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash. >>>>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>>> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 >>>>>> : >>>>>> system msg for write_line failure : Bad file descriptor >>>>>> >>>>>> >>>>>> Is there a standard way to achieve this? >>>>> >>>>> Your B is duplicated by all processes? >>>>> If so, then, call https://petsc.org/main/manualpages/Mat/MatMPIAIJGetLocalMat/, do a sequential product with B on COMM_SELF, not COMM_WORLD, and use https://petsc.org/main/manualpages/Mat/MatCreateMPIMatConcatenateSeqMat/ with the output. >>>>> >>>>> Thanks, >>>>> Pierre >>>>> >>>>>> Thanks, >>>>>> Thanos >>>>>> >>>>>> >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Thu Aug 24 05:10:00 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Thu, 24 Aug 2023 13:10:00 +0300 Subject: [petsc-users] C++11 related issue Message-ID: I was facing a problem while compiling a code (which earlier, got compiled successfully using the same petsc set up), the problem was related to compilers. I decided to reconfigure petsc but ran into errors which are related to non-compliance of the compiler with 'C++11'. I had faced this issue earlier when I was installing Petsc and I had it resolved by using a newer version of compiler (openmpi-4.1.5). Now, I am trying to use the same compiler (openmpi-4.1.5) to reconfigure petsc but the old issue (related to 'C++11') pops up. I used a code that was available online to check if the present compiler supports C++11, and it shows it does support. I have attached the '*configure.log*' herewith for your reference. Can anyone suggest how to resolve/work-around this issue? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 138002 bytes Desc: not available URL: From knepley at gmail.com Thu Aug 24 05:36:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 24 Aug 2023 06:36:58 -0400 Subject: [petsc-users] C++11 related issue In-Reply-To: References: Message-ID: On Thu, Aug 24, 2023 at 6:10?AM maitri ksh wrote: > I was facing a problem while compiling a code (which earlier, got compiled > successfully using the same petsc set up), the problem was related to > compilers. I decided to reconfigure petsc but ran into errors which are > related to non-compliance of the compiler with 'C++11'. I had faced this > issue earlier when I was installing Petsc and I had it resolved by using a > newer version of compiler (openmpi-4.1.5). > OpenMPI is not a compiler. It is an implementation of MPI that produces compiler wrappers. Your actual compiler appears to be GCC 4.8.2: Output from compiling with -std=c++11 In file included from /usr/include/c++/4.8.2/algorithm:62:0, from /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:9: /usr/include/c++/4.8.2/bits/stl_algo.h: In instantiation of ?_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, _RandomAccessIterator, const _Tp&, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator*, std::vector > >; _Tp = std::unique_ptr; _Compare = main()::__lambda0]?: /usr/include/c++/4.8.2/bits/stl_algo.h:2296:78: required from ?_RandomAccessIterator std::__unguarded_partition_pivot(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator*, std::vector > >; _Compare = main()::__lambda0]? /usr/include/c++/4.8.2/bits/stl_algo.h:2337:62: required from ?void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator*, std::vector > >; _Size = long int; _Compare = main()::__lambda0]? /usr/include/c++/4.8.2/bits/stl_algo.h:5499:44: required from ?void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = __gnu_cxx::__normal_iterator*, std::vector > >; _Compare = main()::__lambda0]? /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:58:119: required from here /usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: error: no match for call to ?(main()::__lambda0) (std::unique_ptr&, const std::unique_ptr&)? while (__comp(*__first, __pivot)) ^ /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:58:42: note: candidates are: std::sort(vector.begin(), vector.end(), [](std::unique_ptr &a, std::unique_ptr &b) { return *a < *b; }); ^ In file included from /usr/include/c++/4.8.2/algorithm:62:0, from /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:9: /usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: note: bool (*)(std::unique_ptr&, std::unique_ptr&) while (__comp(*__first, __pivot)) This compiler was released almost 10 years ago and has incomplete support for C++11. > Now, I am trying to use the same compiler (openmpi-4.1.5) to reconfigure > petsc but the old issue (related to 'C++11') pops up. I used a code that > was available online > > to check if the present compiler supports C++11, and it shows it does > support. > You may not have read to the bottom of the answer, but it tells you how to check for complete support for C++11 and this compiler definitely does not have it. Thanks, Matt > I have attached the '*configure.log*' herewith for your reference. > Can anyone suggest how to resolve/work-around this issue? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Thu Aug 24 11:22:25 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Thu, 24 Aug 2023 16:22:25 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. These are my modules: Currently Loaded Modules: 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7) spectrum-mpi/10.4.0.3-20210112 9) nsight-systems/2021.3.1.54 2) hsi/5.0.2.p5 4) xalt/1.2.1 6) nvhpc/22.11 8) nsight-compute/2021.2.1 10) cuda/11.7.1 I configured and compiled petsc with these options: ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda without issues. The MPI checks did not go through as this was done in the login node. Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a USE PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) using that module, even though the PRIVATE statement has been used in said (TEST_MOD) module. MODULE TEST_MOD ! In this module we use PETSC. USE PETSC !USE MPI IMPLICIT NONE PRIVATE PUBLIC :: TEST1 CONTAINS SUBROUTINE TEST1(A) IMPLICIT NONE REAL, INTENT(INOUT) :: A INTEGER :: IERR A=0. ENDSUBROUTINE TEST1 ENDMODULE TEST_MOD PROGRAM MAIN ! Assume in main we use some MPI_F08 features. USE MPI_F08 USE TEST_MOD, ONLY : TEST1 IMPLICIT NONE INTEGER :: MY_RANK,IERR=0 INTEGER :: PNAMELEN=0 INTEGER :: PROVIDED INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED REAL :: A=0. CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR) CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR) CALL TEST1(A) CALL MPI_FINALIZE(IERR) ENDPROGRAM MAIN Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code: vanellam at login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" mpitest.f90 NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34) NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37) 0 inform, 0 warnings, 2 severes, 0 fatal for main Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems. Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too. Thanks! Marcos ________________________________ From: Junchao Zhang Sent: Tuesday, August 22, 2023 5:25 PM To: Matthew Knepley Cc: Vanella, Marcos (Fed) ; PETSc users list ; Guan, Collin X. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Macros, yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use. Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node). The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. So you can try the helper script set_affinity_gpu_polaris.sh to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Then, check again with nvidia-smi to see if GPU memory is evenly allocated. --Junchao Zhang On Tue, Aug 22, 2023 at 3:03?PM Matthew Knepley > wrote: On Tue, Aug 22, 2023 at 2:54?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). The PETSc configure examples are in the repository: https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads Thanks, Matt I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? Thanks! I configured the library --with-cuda and when compiling I get a compilation error with CUDAC: CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :141:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :198:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :149:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :208:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(len); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(flg); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(haystack); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(needle); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 **************************ERROR************************************* Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** ________________________________ From: Junchao Zhang > Sent: Monday, August 21, 2023 4:17 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU That is a good question. Looking at https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated. --Junchao Zhang On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) > wrote: Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? It says in the script it has allocated 2.4GB Best, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 21, 2023 3:29 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. Thanks. On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | +---------------------------------------------------------------------------------------+ You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=gpu #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 #SBATCH --nodes=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:4 export OMP_NUM_THREADS=1 # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Thank you for the advice, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 24 11:40:31 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Aug 2023 12:40:31 -0400 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: Message-ID: <9657E035-F5E1-42C0-B5F1-C569294D6777@petsc.dev> PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also use the MPI_F08 modules the compiler sees two sets of interfaces for the same functions hence the error. I am not sure if it portable to use PETSc with the F08 Fortran modules in the same program or routine. > On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users wrote: > > Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. > These are my modules: > > Currently Loaded Modules: > 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7) spectrum-mpi/10.4.0.3-20210112 9) nsight-systems/2021.3.1.54 > 2) hsi/5.0.2.p5 4) xalt/1.2.1 6) nvhpc/22.11 8) nsight-compute/2021.2.1 10) cuda/11.7.1 > > I configured and compiled petsc with these options: > > ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda > > without issues. The MPI checks did not go through as this was done in the login node. > > Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a USE PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) using that module, even though the PRIVATE statement has been used in said (TEST_MOD) module. > > MODULE TEST_MOD > ! In this module we use PETSC. > USE PETSC > !USE MPI > IMPLICIT NONE > PRIVATE > PUBLIC :: TEST1 > > CONTAINS > SUBROUTINE TEST1(A) > IMPLICIT NONE > REAL, INTENT(INOUT) :: A > INTEGER :: IERR > A=0. > ENDSUBROUTINE TEST1 > > ENDMODULE TEST_MOD > > > PROGRAM MAIN > > ! Assume in main we use some MPI_F08 features. > USE MPI_F08 > USE TEST_MOD, ONLY : TEST1 > IMPLICIT NONE > INTEGER :: MY_RANK,IERR=0 > INTEGER :: PNAMELEN=0 > INTEGER :: PROVIDED > INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED > REAL :: A=0. > CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR) > CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR) > CALL TEST1(A) > CALL MPI_FINALIZE(IERR) > > ENDPROGRAM MAIN > > Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code: > > vanellam at login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" mpitest.f90 > NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34) > NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37) > 0 inform, 0 warnings, 2 severes, 0 fatal for main > > Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems. > > Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too. > > Thanks! > Marcos > > > From: Junchao Zhang > > Sent: Tuesday, August 22, 2023 5:25 PM > To: Matthew Knepley > > Cc: Vanella, Marcos (Fed) >; PETSc users list >; Guan, Collin X. (Fed) > > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU > > Macros, > yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use. > Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node). The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. > So you can try the helper script set_affinity_gpu_polaris.sh to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with > srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda > > Then, check again with nvidia-smi to see if GPU memory is evenly allocated. > --Junchao Zhang > > > On Tue, Aug 22, 2023 at 3:03?PM Matthew Knepley > wrote: > On Tue, Aug 22, 2023 at 2:54?PM Vanella, Marcos (Fed) via petsc-users > wrote: > Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. > > I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). > > The PETSc configure examples are in the repository: > > https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads > > Thanks, > > Matt > > I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? > > Thanks! > > I configured the library --with-cuda and when compiling I get a compilation error with CUDAC: > > CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 : > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: > In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] > THRUST_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' > THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' > # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' > # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :141:6: note: expanded from here > GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 : > In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: > In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: > In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: > In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] > CUB_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' > CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' > # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' > # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :198:6: note: expanded from here > GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here > > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 : > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: > In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] > THRUST_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' > THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' > # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' > # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :149:6: note: expanded from here > GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 : > In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: > In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: > In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: > In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: > In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: > In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] > CUB_COMPILER_DEPRECATION(Clang 7.0); > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' > CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' > # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) > ^ > /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' > # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) > ^ > :208:6: note: expanded from here > GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here > > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(len); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(s); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(flg); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(n); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(s); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(n); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(b); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(a); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(b); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(tmp); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(haystack); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(needle); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(tmp); > ^ > /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume' > ; __builtin_assume(t); > ^ > fatal error: too many errors emitted, stopping now [-ferror-limit=] > 20 errors generated. > Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. > gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 > gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 > **************************ERROR************************************* > Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log > Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov > ******************************************************************** > > > > From: Junchao Zhang > > Sent: Monday, August 21, 2023 4:17 PM > To: Vanella, Marcos (Fed) > > Cc: PETSc users list >; Guan, Collin X. (Fed) > > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU > > That is a good question. Looking at https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated. > > --Junchao Zhang > > > On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) > wrote: > Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? > It says in the script it has allocated 2.4GB > Best, > Marcos > From: Junchao Zhang > > Sent: Monday, August 21, 2023 3:29 PM > To: Vanella, Marcos (Fed) > > Cc: PETSc users list >; Guan, Collin X. (Fed) > > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU > > Hi, Macros, > If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. > The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. > > Thanks. > > On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) > wrote: > Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: > > Mon Aug 21 14:36:07 2023 > +---------------------------------------------------------------------------------------+ > | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | > |-----------------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | > | | | MIG M. | > |=========================================+======================+======================| > | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | > | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | > | | | N/A | > +-----------------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | > | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | > | | | N/A | > +-----------------------------------------+----------------------+----------------------+ > | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | > | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | > | | | N/A | > +-----------------------------------------+----------------------+----------------------+ > | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | > | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | > | | | N/A | > +-----------------------------------------+----------------------+----------------------+ > > +---------------------------------------------------------------------------------------+ > | Processes: | > | GPU GI CI PID Type Process name GPU Memory | > | ID ID Usage | > |=======================================================================================| > | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | > | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | > +---------------------------------------------------------------------------------------+ > > > You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. > This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=gpu > #SBATCH --ntasks=16 > #SBATCH --ntasks-per-node=8 > #SBATCH --cpus-per-task=1 > #SBATCH --nodes=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:4 > > export OMP_NUM_THREADS=1 > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > > srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda > > Thank you for the advice, > Marcos > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Thu Aug 24 13:00:41 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Thu, 24 Aug 2023 18:00:41 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: <9657E035-F5E1-42C0-B5F1-C569294D6777@petsc.dev> References: <9657E035-F5E1-42C0-B5F1-C569294D6777@petsc.dev> Message-ID: Thank you Barry, I will dial back the MPI_F08 use in our source code and try compiling it. I haven't found much information regarding using MPI and MPI_F08 in different modules other than the following link from several years ago: https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5 Looks like this has been fixed for openmpi and newer gfortran versions because I don't have issues with this MPI lib/compiler combination. Same with openmpi/ifort. What I find quite interesting is: I assumed the PRIVATE statement in a module should provide a backstop on the access propagation of variables not explicitly stated in the PUBLIC statement in a module, including the ones that belong to other modules upstream visible through USE. This does not seem to be the case here. Best, Marcos ________________________________ From: Barry Smith Sent: Thursday, August 24, 2023 12:40 PM To: Vanella, Marcos (Fed) Cc: PETSc users list ; Guan, Collin X. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also use the MPI_F08 modules the compiler sees two sets of interfaces for the same functions hence the error. I am not sure if it portable to use PETSc with the F08 Fortran modules in the same program or routine. On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users wrote: Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. These are my modules: Currently Loaded Modules: 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7) spectrum-mpi/10.4.0.3-20210112 9) nsight-systems/2021.3.1.54 2) hsi/5.0.2.p5 4) xalt/1.2.1 6) nvhpc/22.11 8) nsight-compute/2021.2.1 10) cuda/11.7.1 I configured and compiled petsc with these options: ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda without issues. The MPI checks did not go through as this was done in the login node. Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a USE PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) using that module, even though the PRIVATE statement has been used in said (TEST_MOD) module. MODULE TEST_MOD ! In this module we use PETSC. USE PETSC !USE MPI IMPLICIT NONE PRIVATE PUBLIC :: TEST1 CONTAINS SUBROUTINE TEST1(A) IMPLICIT NONE REAL, INTENT(INOUT) :: A INTEGER :: IERR A=0. ENDSUBROUTINE TEST1 ENDMODULE TEST_MOD PROGRAM MAIN ! Assume in main we use some MPI_F08 features. USE MPI_F08 USE TEST_MOD, ONLY : TEST1 IMPLICIT NONE INTEGER :: MY_RANK,IERR=0 INTEGER :: PNAMELEN=0 INTEGER :: PROVIDED INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED REAL :: A=0. CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR) CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR) CALL TEST1(A) CALL MPI_FINALIZE(IERR) ENDPROGRAM MAIN Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code: vanellam at login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" mpitest.f90 NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34) NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37) 0 inform, 0 warnings, 2 severes, 0 fatal for main Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems. Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too. Thanks! Marcos ________________________________ From: Junchao Zhang > Sent: Tuesday, August 22, 2023 5:25 PM To: Matthew Knepley > Cc: Vanella, Marcos (Fed) >; PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Macros, yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use. Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node). The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. So you can try the helper script set_affinity_gpu_polaris.sh to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Then, check again with nvidia-smi to see if GPU memory is evenly allocated. --Junchao Zhang On Tue, Aug 22, 2023 at 3:03?PM Matthew Knepley > wrote: On Tue, Aug 22, 2023 at 2:54?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). The PETSc configure examples are in the repository: https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads Thanks, Matt I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? Thanks! I configured the library --with-cuda and when compiling I get a compilation error with CUDAC: CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :141:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :198:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :149:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :208:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(len); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(flg); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(haystack); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(needle); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 **************************ERROR************************************* Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** ________________________________ From: Junchao Zhang > Sent: Monday, August 21, 2023 4:17 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU That is a good question. Looking at https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated. --Junchao Zhang On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) > wrote: Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? It says in the script it has allocated 2.4GB Best, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 21, 2023 3:29 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. Thanks. On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | +---------------------------------------------------------------------------------------+ You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=gpu #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 #SBATCH --nodes=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:4 export OMP_NUM_THREADS=1 # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Thank you for the advice, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 24 14:07:24 2023 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 24 Aug 2023 15:07:24 -0400 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: References: <9657E035-F5E1-42C0-B5F1-C569294D6777@petsc.dev> Message-ID: <590D1CA0-3F9D-4548-B44B-FE856A088F22@petsc.dev> > On Aug 24, 2023, at 2:00 PM, Vanella, Marcos (Fed) wrote: > > Thank you Barry, I will dial back the MPI_F08 use in our source code and try compiling it. I haven't found much information regarding using MPI and MPI_F08 in different modules other than the following link from several years ago: > > https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5 > > Looks like this has been fixed for openmpi and newer gfortran versions because I don't have issues with this MPI lib/compiler combination. Same with openmpi/ifort. > What I find quite interesting is: I assumed the PRIVATE statement in a module should provide a backstop on the access propagation of variables not explicitly stated in the PUBLIC statement in a module, including the ones that belong to other modules upstream visible through USE. This does not seem to be the case here. I agree, you had seemingly inconsistent results with your different tests; it could be bugs in the handling of modules by the Fortran system. > > Best, > Marcos > > > From: Barry Smith > > Sent: Thursday, August 24, 2023 12:40 PM > To: Vanella, Marcos (Fed) > > Cc: PETSc users list >; Guan, Collin X. (Fed) > > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU > > > PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also use the MPI_F08 modules the compiler sees two sets of interfaces for the same functions hence the error. I am not sure if it portable to use PETSc with the F08 Fortran modules in the same program or routine. > > > > > >> On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users > wrote: >> >> Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. >> These are my modules: >> >> Currently Loaded Modules: >> 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7) spectrum-mpi/10.4.0.3-20210112 9) nsight-systems/2021.3.1.54 >> 2) hsi/5.0.2.p5 4) xalt/1.2.1 6) nvhpc/22.11 8) nsight-compute/2021.2.1 10) cuda/11.7.1 >> >> I configured and compiled petsc with these options: >> >> ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda >> >> without issues. The MPI checks did not go through as this was done in the login node. >> >> Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a USE PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) using that module, even though the PRIVATE statement has been used in said (TEST_MOD) module. >> >> MODULE TEST_MOD >> ! In this module we use PETSC. >> USE PETSC >> !USE MPI >> IMPLICIT NONE >> PRIVATE >> PUBLIC :: TEST1 >> >> CONTAINS >> SUBROUTINE TEST1(A) >> IMPLICIT NONE >> REAL, INTENT(INOUT) :: A >> INTEGER :: IERR >> A=0. >> ENDSUBROUTINE TEST1 >> >> ENDMODULE TEST_MOD >> >> >> PROGRAM MAIN >> >> ! Assume in main we use some MPI_F08 features. >> USE MPI_F08 >> USE TEST_MOD, ONLY : TEST1 >> IMPLICIT NONE >> INTEGER :: MY_RANK,IERR=0 >> INTEGER :: PNAMELEN=0 >> INTEGER :: PROVIDED >> INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED >> REAL :: A=0. >> CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR) >> CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR) >> CALL TEST1(A) >> CALL MPI_FINALIZE(IERR) >> >> ENDPROGRAM MAIN >> >> Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code: >> >> vanellam at login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" mpitest.f90 >> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34) >> NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37) >> 0 inform, 0 warnings, 2 severes, 0 fatal for main >> >> Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems. >> >> Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too. >> >> Thanks! >> Marcos >> >> >> From: Junchao Zhang > >> Sent: Tuesday, August 22, 2023 5:25 PM >> To: Matthew Knepley > >> Cc: Vanella, Marcos (Fed) >; PETSc users list >; Guan, Collin X. (Fed) > >> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU >> >> Macros, >> yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use. >> Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node). The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. >> So you can try the helper script set_affinity_gpu_polaris.sh to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with >> srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda >> >> Then, check again with nvidia-smi to see if GPU memory is evenly allocated. >> --Junchao Zhang >> >> >> On Tue, Aug 22, 2023 at 3:03?PM Matthew Knepley > wrote: >> On Tue, Aug 22, 2023 at 2:54?PM Vanella, Marcos (Fed) via petsc-users > wrote: >> Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. >> >> I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). >> >> The PETSc configure examples are in the repository: >> >> https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads >> >> Thanks, >> >> Matt >> >> I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? >> >> Thanks! >> >> I configured the library --with-cuda and when compiling I get a compilation error with CUDAC: >> >> CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 : >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] >> THRUST_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' >> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' >> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' >> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :141:6: note: expanded from here >> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 : >> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: >> In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: >> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] >> CUB_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' >> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' >> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' >> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :198:6: note: expanded from here >> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here >> >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1 : >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] >> THRUST_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' >> THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' >> # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' >> # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :149:6: note: expanded from here >> GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2 : >> In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: >> In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: >> In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: >> In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] >> CUB_COMPILER_DEPRECATION(Clang 7.0); >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' >> CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' >> # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) >> ^ >> /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' >> # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) >> ^ >> :208:6: note: expanded from here >> GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here >> >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(len); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(s); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(flg); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(n); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(s); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(n); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(b); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(a); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(b); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(tmp); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(haystack); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(needle); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(tmp); >> ^ >> /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume' >> ; __builtin_assume(t); >> ^ >> fatal error: too many errors emitted, stopping now [-ferror-limit=] >> 20 errors generated. >> Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. >> gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 >> gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 >> **************************ERROR************************************* >> Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log >> Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov >> ******************************************************************** >> >> >> >> From: Junchao Zhang > >> Sent: Monday, August 21, 2023 4:17 PM >> To: Vanella, Marcos (Fed) > >> Cc: PETSc users list >; Guan, Collin X. (Fed) > >> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU >> >> That is a good question. Looking at https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated. >> >> --Junchao Zhang >> >> >> On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) > wrote: >> Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? >> It says in the script it has allocated 2.4GB >> Best, >> Marcos >> From: Junchao Zhang > >> Sent: Monday, August 21, 2023 3:29 PM >> To: Vanella, Marcos (Fed) > >> Cc: PETSc users list >; Guan, Collin X. (Fed) > >> Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU >> >> Hi, Macros, >> If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. >> The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. >> >> Thanks. >> >> On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) > wrote: >> Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: >> >> Mon Aug 21 14:36:07 2023 >> +---------------------------------------------------------------------------------------+ >> | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | >> |-----------------------------------------+----------------------+----------------------+ >> | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | >> | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | >> | | | MIG M. | >> |=========================================+======================+======================| >> | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | >> | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | >> | | | N/A | >> +-----------------------------------------+----------------------+----------------------+ >> | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | >> | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | >> | | | N/A | >> +-----------------------------------------+----------------------+----------------------+ >> | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | >> | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | >> | | | N/A | >> +-----------------------------------------+----------------------+----------------------+ >> | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | >> | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | >> | | | N/A | >> +-----------------------------------------+----------------------+----------------------+ >> >> +---------------------------------------------------------------------------------------+ >> | Processes: | >> | GPU GI CI PID Type Process name GPU Memory | >> | ID ID Usage | >> |=======================================================================================| >> | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | >> | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | >> +---------------------------------------------------------------------------------------+ >> >> >> You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. >> This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): >> >> #!/bin/bash >> # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds >> #SBATCH -J test >> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err >> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log >> #SBATCH --partition=gpu >> #SBATCH --ntasks=16 >> #SBATCH --ntasks-per-node=8 >> #SBATCH --cpus-per-task=1 >> #SBATCH --nodes=2 >> #SBATCH --time=01:00:00 >> #SBATCH --gres=gpu:4 >> >> export OMP_NUM_THREADS=1 >> # modules >> module load cuda/11.7 >> module load gcc/11.2.1/toolset >> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 >> >> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc >> >> srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda >> >> Thank you for the advice, >> Marcos >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.vanella at nist.gov Thu Aug 24 16:13:30 2023 From: marcos.vanella at nist.gov (Vanella, Marcos (Fed)) Date: Thu, 24 Aug 2023 21:13:30 +0000 Subject: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU In-Reply-To: <590D1CA0-3F9D-4548-B44B-FE856A088F22@petsc.dev> References: <9657E035-F5E1-42C0-B5F1-C569294D6777@petsc.dev> <590D1CA0-3F9D-4548-B44B-FE856A088F22@petsc.dev> Message-ID: Hi Barry, an update on this. I reverted to using MPI instead of MPI_F08 and could compile the code in summit with nvhpc and spectrum-mpi. I also moved to polaris, compiled petsc with gcc, cray-mpich and cuda and was able to compile FDS + PETSc right off the bat without any source changes (i.e. USE PETSC and USE MPI_F08 were defined in the source). Mi impression is that there is an underlying issue with spectrum-mpi. I will make tests with both combinations PETSc + USE MPI and PETSc + USE MPI_F08, both with GPU to see if the fact that we are mixing MPI fortran versions has an effect in hardware use, timings. Thanks, Marcos ________________________________ From: Barry Smith Sent: Thursday, August 24, 2023 3:07 PM To: Vanella, Marcos (Fed) Cc: PETSc users list ; Guan, Collin X. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU On Aug 24, 2023, at 2:00 PM, Vanella, Marcos (Fed) wrote: Thank you Barry, I will dial back the MPI_F08 use in our source code and try compiling it. I haven't found much information regarding using MPI and MPI_F08 in different modules other than the following link from several years ago: https://users.open-mpi.narkive.com/eCCG36Ni/ompi-fortran-problem-when-mixing-use-mpi-and-use-mpi-f08-with-gfortran-5 Looks like this has been fixed for openmpi and newer gfortran versions because I don't have issues with this MPI lib/compiler combination. Same with openmpi/ifort. What I find quite interesting is: I assumed the PRIVATE statement in a module should provide a backstop on the access propagation of variables not explicitly stated in the PUBLIC statement in a module, including the ones that belong to other modules upstream visible through USE. This does not seem to be the case here. I agree, you had seemingly inconsistent results with your different tests; it could be bugs in the handling of modules by the Fortran system. Best, Marcos ________________________________ From: Barry Smith > Sent: Thursday, August 24, 2023 12:40 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU PETSc uses the non-MPI_F08 Fortran modules so I am guessing when you also use the MPI_F08 modules the compiler sees two sets of interfaces for the same functions hence the error. I am not sure if it portable to use PETSc with the F08 Fortran modules in the same program or routine. On Aug 24, 2023, at 12:22 PM, Vanella, Marcos (Fed) via petsc-users > wrote: Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. These are my modules: Currently Loaded Modules: 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7) spectrum-mpi/10.4.0.3-20210112 9) nsight-systems/2021.3.1.54 2) hsi/5.0.2.p5 4) xalt/1.2.1 6) nvhpc/22.11 8) nsight-compute/2021.2.1 10) cuda/11.7.1 I configured and compiled petsc with these options: ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda without issues. The MPI checks did not go through as this was done in the login node. Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a USE PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) using that module, even though the PRIVATE statement has been used in said (TEST_MOD) module. MODULE TEST_MOD ! In this module we use PETSC. USE PETSC !USE MPI IMPLICIT NONE PRIVATE PUBLIC :: TEST1 CONTAINS SUBROUTINE TEST1(A) IMPLICIT NONE REAL, INTENT(INOUT) :: A INTEGER :: IERR A=0. ENDSUBROUTINE TEST1 ENDMODULE TEST_MOD PROGRAM MAIN ! Assume in main we use some MPI_F08 features. USE MPI_F08 USE TEST_MOD, ONLY : TEST1 IMPLICIT NONE INTEGER :: MY_RANK,IERR=0 INTEGER :: PNAMELEN=0 INTEGER :: PROVIDED INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED REAL :: A=0. CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR) CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR) CALL TEST1(A) CALL MPI_FINALIZE(IERR) ENDPROGRAM MAIN Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code: vanellam at login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" mpitest.f90 NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34) NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37) 0 inform, 0 warnings, 2 severes, 0 fatal for main Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems. Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too. Thanks! Marcos ________________________________ From: Junchao Zhang > Sent: Tuesday, August 22, 2023 5:25 PM To: Matthew Knepley > Cc: Vanella, Marcos (Fed) >; PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Macros, yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use. Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node). The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. So you can try the helper script set_affinity_gpu_polaris.sh to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Then, check again with nvidia-smi to see if GPU memory is evenly allocated. --Junchao Zhang On Tue, Aug 22, 2023 at 3:03?PM Matthew Knepley > wrote: On Tue, Aug 22, 2023 at 2:54?PM Vanella, Marcos (Fed) via petsc-users > wrote: Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). The PETSc configure examples are in the repository: https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-summit-opt.py?ref_type=heads Thanks, Matt I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? Thanks! I configured the library --with-cuda and when compiling I get a compilation error with CUDAC: CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :141:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :198:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :149:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :208:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h(68): warning #1835-D: attribute "warn_unused_result" does not apply here /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:55:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:78:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:107:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(len); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:144:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:150:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:198:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(flg); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:249:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:251:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(s); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:291:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(n); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:330:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:333:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:334:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:367:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(a); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:368:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(b); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:369:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:403:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(haystack); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:404:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(needle); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:405:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(tmp); ^ /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscstring.h:437:3: error: use of undeclared identifier '__builtin_assume' ; __builtin_assume(t); ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. Error while processing /tmp/tmpxft_0001add6_00000000-6_curand2.cudafe1.cpp. gmake[3]: *** [gmakefile:209: arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o] Error 1 gmake[2]: *** [/autofs/nccs-svm1_home1/vanellam/Software/petsc/lib/petsc/conf/rules.doc:28: libs] Error 2 **************************ERROR************************************* Error during compile, check arch-linux-opt-xl/lib/petsc/conf/make.log Send it and arch-linux-opt-xl/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** ________________________________ From: Junchao Zhang > Sent: Monday, August 21, 2023 4:17 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU That is a good question. Looking at https://slurm.schedmd.com/gres.html#GPU_Management, I was wondering if you can share the output of your job so we can search CUDA_VISIBLE_DEVICES and see how GPUs were allocated. --Junchao Zhang On Mon, Aug 21, 2023 at 2:38?PM Vanella, Marcos (Fed) > wrote: Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? It says in the script it has allocated 2.4GB Best, Marcos ________________________________ From: Junchao Zhang > Sent: Monday, August 21, 2023 3:29 PM To: Vanella, Marcos (Fed) > Cc: PETSc users list >; Guan, Collin X. (Fed) > Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. Thanks. On Mon, Aug 21, 2023 at 2:00?PM Vanella, Marcos (Fed) > wrote: Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-16GB On | 00000004:04:00.0 Off | 0 | | N/A 34C P0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-16GB On | 00000004:05:00.0 Off | 0 | | N/A 38C P0 56W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-16GB On | 00000035:03:00.0 Off | 0 | | N/A 35C P0 52W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-16GB On | 00000035:04:00.0 Off | 0 | | N/A 38C P0 53W / 300W | 638MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 0 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 0 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | | 1 N/A N/A 214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 1 N/A N/A 214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 2 N/A N/A 214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | | 3 N/A N/A 214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | +---------------------------------------------------------------------------------------+ You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=gpu #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 #SBATCH --nodes=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:4 export OMP_NUM_THREADS=1 # modules module load cuda/11.7 module load gcc/11.2.1/toolset module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 cd /home/mnv/Firemodels_fork/fds/Issues/PETSc srun -N 2 -n 16 /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Thank you for the advice, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From maitri.ksh at gmail.com Fri Aug 25 06:04:30 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Fri, 25 Aug 2023 14:04:30 +0300 Subject: [petsc-users] C++11 related issue In-Reply-To: References: Message-ID: Thanks Matt for the clarification, I used gcc11.2 instead and the petsc-configuration worked out fine. But now, I am facing an issue while doing the 'make check' of the libraries. The error points to some unbalanced or mismatch of double-quote (") /usr/bin/bash: -c: line 6: unexpected EOF while looking for matching `"' /usr/bin/bash: -c: line 7: syntax error: unexpected end of file The error seems to be related to a shell script, I am unable to debug the error, I looked for instances where 'bash -c' is being used, I couldn't locate any. How should I proceed? On Thu, Aug 24, 2023 at 1:37?PM Matthew Knepley wrote: > On Thu, Aug 24, 2023 at 6:10?AM maitri ksh wrote: > >> I was facing a problem while compiling a code (which earlier, got >> compiled successfully using the same petsc set up), the problem was related >> to compilers. I decided to reconfigure petsc but ran into errors which are >> related to non-compliance of the compiler with 'C++11'. I had faced this >> issue earlier when I was installing Petsc and I had it resolved by using a >> newer version of compiler (openmpi-4.1.5). >> > > OpenMPI is not a compiler. It is an implementation of MPI that produces > compiler wrappers. Your actual compiler appears to be GCC 4.8.2: > > Output from compiling with -std=c++11 > In file included from /usr/include/c++/4.8.2/algorithm:62:0, > from > /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:9: > /usr/include/c++/4.8.2/bits/stl_algo.h: In instantiation of > ?_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, > _RandomAccessIterator, const _Tp&, _Compare) [with _RandomAccessIterator = > __gnu_cxx::__normal_iterator*, > std::vector > >; _Tp = std::unique_ptr; > _Compare = main()::__lambda0]?: > /usr/include/c++/4.8.2/bits/stl_algo.h:2296:78: required from > ?_RandomAccessIterator > std::__unguarded_partition_pivot(_RandomAccessIterator, > _RandomAccessIterator, _Compare) [with _RandomAccessIterator = > __gnu_cxx::__normal_iterator*, > std::vector > >; _Compare = main()::__lambda0]? > /usr/include/c++/4.8.2/bits/stl_algo.h:2337:62: required from ?void > std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, > _Compare) [with _RandomAccessIterator = > __gnu_cxx::__normal_iterator*, > std::vector > >; _Size = long int; _Compare = > main()::__lambda0]? > /usr/include/c++/4.8.2/bits/stl_algo.h:5499:44: required from ?void > std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = > __gnu_cxx::__normal_iterator*, > std::vector > >; _Compare = main()::__lambda0]? > /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:58:119: required > from here > /usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: error: no match for call > to ?(main()::__lambda0) (std::unique_ptr&, const > std::unique_ptr&)? > while (__comp(*__first, __pivot)) > ^ > /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:58:42: note: > candidates are: > std::sort(vector.begin(), vector.end(), [](std::unique_ptr &a, > std::unique_ptr &b) { return *a < *b; }); > ^ > In file included from /usr/include/c++/4.8.2/algorithm:62:0, > from > /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:9: > /usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: note: bool > (*)(std::unique_ptr&, std::unique_ptr&) > while (__comp(*__first, __pivot)) > > This compiler was released almost 10 years ago and has incomplete support > for C++11. > > >> Now, I am trying to use the same compiler (openmpi-4.1.5) to reconfigure >> petsc but the old issue (related to 'C++11') pops up. I used a code that >> was available online >> >> to check if the present compiler supports C++11, and it shows it does >> support. >> > > You may not have read to the bottom of the answer, but it tells you how to > check for complete support for C++11 and this compiler definitely does not > have it. > > Thanks, > > Matt > > >> I have attached the '*configure.log*' herewith for your reference. >> Can anyone suggest how to resolve/work-around this issue? >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 25 07:25:58 2023 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 25 Aug 2023 08:25:58 -0400 Subject: [petsc-users] C++11 related issue In-Reply-To: References: Message-ID: On Fri, Aug 25, 2023 at 7:05?AM maitri ksh wrote: > Thanks Matt for the clarification, I used gcc11.2 instead and the > petsc-configuration worked out fine. But now, I am facing an issue while > doing the 'make check' of the libraries. The error points to some > unbalanced or mismatch of double-quote (") > > /usr/bin/bash: -c: line 6: unexpected EOF while looking for matching `"' > /usr/bin/bash: -c: line 7: syntax error: unexpected end of file > > The error seems to be related to a shell script, I am unable to debug the > error, I looked for instances where 'bash -c' is being used, I couldn't > locate any. How should I proceed? > You can do the check by hand: cd $PETSC_DIR cd src/snes/tutorials make ex5 ./.ex5 -snes_monitor Thanks, Matt > On Thu, Aug 24, 2023 at 1:37?PM Matthew Knepley wrote: > >> On Thu, Aug 24, 2023 at 6:10?AM maitri ksh wrote: >> >>> I was facing a problem while compiling a code (which earlier, got >>> compiled successfully using the same petsc set up), the problem was related >>> to compilers. I decided to reconfigure petsc but ran into errors which are >>> related to non-compliance of the compiler with 'C++11'. I had faced this >>> issue earlier when I was installing Petsc and I had it resolved by using a >>> newer version of compiler (openmpi-4.1.5). >>> >> >> OpenMPI is not a compiler. It is an implementation of MPI that produces >> compiler wrappers. Your actual compiler appears to be GCC 4.8.2: >> >> Output from compiling with -std=c++11 >> In file included from /usr/include/c++/4.8.2/algorithm:62:0, >> from >> /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:9: >> /usr/include/c++/4.8.2/bits/stl_algo.h: In instantiation of >> ?_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, >> _RandomAccessIterator, const _Tp&, _Compare) [with _RandomAccessIterator = >> __gnu_cxx::__normal_iterator*, >> std::vector > >; _Tp = std::unique_ptr; >> _Compare = main()::__lambda0]?: >> /usr/include/c++/4.8.2/bits/stl_algo.h:2296:78: required from >> ?_RandomAccessIterator >> std::__unguarded_partition_pivot(_RandomAccessIterator, >> _RandomAccessIterator, _Compare) [with _RandomAccessIterator = >> __gnu_cxx::__normal_iterator*, >> std::vector > >; _Compare = main()::__lambda0]? >> /usr/include/c++/4.8.2/bits/stl_algo.h:2337:62: required from ?void >> std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, >> _Compare) [with _RandomAccessIterator = >> __gnu_cxx::__normal_iterator*, >> std::vector > >; _Size = long int; _Compare = >> main()::__lambda0]? >> /usr/include/c++/4.8.2/bits/stl_algo.h:5499:44: required from ?void >> std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = >> __gnu_cxx::__normal_iterator*, >> std::vector > >; _Compare = main()::__lambda0]? >> /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:58:119: required >> from here >> /usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: error: no match for call >> to ?(main()::__lambda0) (std::unique_ptr&, const >> std::unique_ptr&)? >> while (__comp(*__first, __pivot)) >> ^ >> /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:58:42: note: >> candidates are: >> std::sort(vector.begin(), vector.end(), [](std::unique_ptr &a, >> std::unique_ptr &b) { return *a < *b; }); >> ^ >> In file included from /usr/include/c++/4.8.2/algorithm:62:0, >> from >> /tmp/petsc-hl_0r720/config.setCompilers/conftest.cc:9: >> /usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: note: bool >> (*)(std::unique_ptr&, std::unique_ptr&) >> while (__comp(*__first, __pivot)) >> >> This compiler was released almost 10 years ago and has incomplete support >> for C++11. >> >> >>> Now, I am trying to use the same compiler (openmpi-4.1.5) to reconfigure >>> petsc but the old issue (related to 'C++11') pops up. I used a code that >>> was available online >>> >>> to check if the present compiler supports C++11, and it shows it does >>> support. >>> >> >> You may not have read to the bottom of the answer, but it tells you how >> to check for complete support for C++11 and this compiler definitely does >> not have it. >> >> Thanks, >> >> Matt >> >> >>> I have attached the '*configure.log*' herewith for your reference. >>> Can anyone suggest how to resolve/work-around this issue? >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Sat Aug 26 08:09:11 2023 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Sat, 26 Aug 2023 15:09:11 +0200 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ Message-ID: Hi, I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. According to the manual "[t]he fieldsplit preconditioner cannot currently be used with the MATBAIJ or MATSBAIJ data formats if the blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in fieldsplit.c at /* extract the A01 and A10 matrices */ ilink = jac->head; PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. >From this I gather that it is in general NOT possible to use PCFIELDSPLIT with MATSBAIJ even with block size 1? Kind regards, Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at lip6.fr Sat Aug 26 08:50:05 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Sat, 26 Aug 2023 22:50:05 +0900 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: References: Message-ID: (Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT. I have a branch that fixes this, but I haven?t rebased in a while (and I?m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball? Thanks, Pierre > On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore wrote: > ? > Hi, > > I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. According to the manual > "[t]he fieldsplit preconditioner cannot currently be used with the MATBAIJ or MATSBAIJ data > formats if the blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can > use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" > from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in fieldsplit.c at > > /* extract the A01 and A10 matrices */ > ilink = jac->head; > PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); > if (jac->offdiag_use_amat) { > PetscCall(MatCreateSubMatrix(pc->mat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); > } else { > PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); > } > > This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. > From this I gather that it is in general NOT possible to use PCFIELDSPLIT with MATSBAIJ even > with block size 1? > > Kind regards, > Carl-Johan From pierre.jolivet at lip6.fr Sat Aug 26 09:35:31 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Sat, 26 Aug 2023 23:35:31 +0900 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: References: Message-ID: <68909B9F-A693-46FD-97B0-23AA5531E814@lip6.fr> > On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore wrote: > > "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT. > I have a branch that fixes this, but I haven?t rebased in a while (and I?m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball?" > > Ok, would be great if you could look at this! I don't need to stick to any particular branch. > > Do you think MATNEST could be an alternative here? Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue. I?m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was crashing but I think it was thoroughly fixed in https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/ It is ugly code on top of ugly code, so I didn?t try to get it integrated and just used the branch locally, and then moved to some other stuff. I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling right now so everything gets done more slowly, sorry). Thanks, Pierre > My matrix is > [A00 A01; > A01^t A11] > so perhaps with MATNEST I can make use of the block-symmetry at least, and then use MATSBAIJ for > A00 and A11 if it's possible to combine matrix types which the manual seems to imply. > > Kind regards > Carl-Johan > > >> On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore wrote: >> ? >> Hi, >> >> I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. >> According to the manual "[t]he fieldsplit preconditioner cannot >> currently be used with the MATBAIJ or MATSBAIJ data formats if the >> blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" >> from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in >> fieldsplit.c at >> >> /* extract the A01 and A10 matrices */ ilink = jac->head; >> PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if >> (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, >> ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { >> PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, >> MAT_INITIAL_MATRIX, &jac->B)); } >> >> This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. >> From this I gather that it is in general NOT possible to use >> PCFIELDSPLIT with MATSBAIJ even with block size 1? >> >> Kind regards, >> Carl-Johan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at lip6.fr Sat Aug 26 10:20:47 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Sun, 27 Aug 2023 00:20:47 +0900 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: References: <68909B9F-A693-46FD-97B0-23AA5531E814@lip6.fr> Message-ID: <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> > On 27 Aug 2023, at 12:14 AM, Carl-Johan Thore wrote: > > ?Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue.? > I?m not sure I follow. Does PCFIELDSPLIT extract further submatrices from these blocks, or is there > somewhere else in the code that things will go wrong? Ah, no, you are right, in that case it should work. > For the MATNEST I was thinking to get some savings from the block-symmetry at least > even if symmetry in A00 and A11 cannot be exploited; using SBAIJ for them would just be a > (pretty big) bonus. > > ?I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling > right now so everything gets done more slowly, sorry).? > Sound great, Thanks again! The MR is there https://gitlab.com/petsc/petsc/-/merge_requests/6841. I need to add a new code path in MatCreateRedundantMatrix() to make sure the resulting Mat is indeed SBAIJ, but that is orthogonal to the PCFIELDSPLIT issue. The branch should be usable in its current state. Thanks, Pierre > > From: Pierre Jolivet > Sent: Saturday, August 26, 2023 4:36 PM > To: Carl-Johan Thore > Cc: Carl-Johan Thore ; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ > > > > > On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore > wrote: > > "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT. > I have a branch that fixes this, but I haven?t rebased in a while (and I?m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball?" > > Ok, would be great if you could look at this! I don't need to stick to any particular branch. > > Do you think MATNEST could be an alternative here? > > Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue. > I?m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was crashing but I think it was thoroughly fixed in https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/ > It is ugly code on top of ugly code, so I didn?t try to get it integrated and just used the branch locally, and then moved to some other stuff. > I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling right now so everything gets done more slowly, sorry). > > Thanks, > Pierre > > > My matrix is > [A00 A01; > A01^t A11] > so perhaps with MATNEST I can make use of the block-symmetry at least, and then use MATSBAIJ for > A00 and A11 if it's possible to combine matrix types which the manual seems to imply. > > Kind regards > Carl-Johan > > > > On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore > wrote: > ? > Hi, > > I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. > According to the manual "[t]he fieldsplit preconditioner cannot > currently be used with the MATBAIJ or MATSBAIJ data formats if the > blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" > from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in > fieldsplit.c at > > /* extract the A01 and A10 matrices */ ilink = jac->head; > PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if > (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, > ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { > PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, > MAT_INITIAL_MATRIX, &jac->B)); } > > This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. > From this I gather that it is in general NOT possible to use > PCFIELDSPLIT with MATSBAIJ even with block size 1? > > Kind regards, > Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Sat Aug 26 11:27:27 2023 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Sat, 26 Aug 2023 16:27:27 +0000 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> References: <68909B9F-A693-46FD-97B0-23AA5531E814@lip6.fr> <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> Message-ID: I would suggest avoiding using SBAIJ matrices, at least in the phase of application code development. We implemented SBAIJ for saving storage, not computational efficiency. SBAIJ does not have as many support as AIJ. After your code works for AIJ, then you may consider taking advantage of smaller storage of SBAIJ (could at cost of communication overhead). Hong ________________________________ From: petsc-users on behalf of Pierre Jolivet via petsc-users Sent: Saturday, August 26, 2023 10:20 AM To: Carl-Johan Thore Cc: petsc-users Subject: Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ On 27 Aug 2023, at 12:14 AM, Carl-Johan Thore wrote: ?Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue.? I?m not sure I follow. Does PCFIELDSPLIT extract further submatrices from these blocks, or is there somewhere else in the code that things will go wrong? Ah, no, you are right, in that case it should work. For the MATNEST I was thinking to get some savings from the block-symmetry at least even if symmetry in A00 and A11 cannot be exploited; using SBAIJ for them would just be a (pretty big) bonus. ?I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling right now so everything gets done more slowly, sorry).? Sound great, Thanks again! The MR is there https://gitlab.com/petsc/petsc/-/merge_requests/6841. I need to add a new code path in MatCreateRedundantMatrix() to make sure the resulting Mat is indeed SBAIJ, but that is orthogonal to the PCFIELDSPLIT issue. The branch should be usable in its current state. Thanks, Pierre From: Pierre Jolivet Sent: Saturday, August 26, 2023 4:36 PM To: Carl-Johan Thore Cc: Carl-Johan Thore ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore > wrote: "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT. I have a branch that fixes this, but I haven?t rebased in a while (and I?m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball?" Ok, would be great if you could look at this! I don't need to stick to any particular branch. Do you think MATNEST could be an alternative here? Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue. I?m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was crashing but I think it was thoroughly fixed in https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/ It is ugly code on top of ugly code, so I didn?t try to get it integrated and just used the branch locally, and then moved to some other stuff. I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling right now so everything gets done more slowly, sorry). Thanks, Pierre My matrix is [A00 A01; A01^t A11] so perhaps with MATNEST I can make use of the block-symmetry at least, and then use MATSBAIJ for A00 and A11 if it's possible to combine matrix types which the manual seems to imply. Kind regards Carl-Johan On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore > wrote: ? Hi, I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. According to the manual "[t]he fieldsplit preconditioner cannot currently be used with the MATBAIJ or MATSBAIJ data formats if the blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in fieldsplit.c at /* extract the A01 and A10 matrices */ ilink = jac->head; PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. From this I gather that it is in general NOT possible to use PCFIELDSPLIT with MATSBAIJ even with block size 1? Kind regards, Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl-johan.thore at liu.se Sat Aug 26 09:16:00 2023 From: carl-johan.thore at liu.se (Carl-Johan Thore) Date: Sat, 26 Aug 2023 14:16:00 +0000 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: References: Message-ID: "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT. I have a branch that fixes this, but I haven?t rebased in a while (and I?m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball?" Ok, would be great if you could look at this! I don't need to stick to any particular branch. Do you think MATNEST could be an alternative here? My matrix is [A00 A01; A01^t A11] so perhaps with MATNEST I can make use of the block-symmetry at least, and then use MATSBAIJ for A00 and A11 if it's possible to combine matrix types which the manual seems to imply. Kind regards Carl-Johan > On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore wrote: > ? > Hi, > > I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. > According to the manual "[t]he fieldsplit preconditioner cannot > currently be used with the MATBAIJ or MATSBAIJ data formats if the > blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" > from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in > fieldsplit.c at > > /* extract the A01 and A10 matrices */ ilink = jac->head; > PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if > (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, > ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { > PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, > MAT_INITIAL_MATRIX, &jac->B)); } > > This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. > From this I gather that it is in general NOT possible to use > PCFIELDSPLIT with MATSBAIJ even with block size 1? > > Kind regards, > Carl-Johan From maitri.ksh at gmail.com Sun Aug 27 07:29:56 2023 From: maitri.ksh at gmail.com (maitri ksh) Date: Sun, 27 Aug 2023 15:29:56 +0300 Subject: [petsc-users] unable to distribute rows of a matrix across processors while loading Message-ID: Hi, I am using MatSetSizes() followed by MatLoad() to distribute the rows of a *sparse* matrix (*480000x480000*) across the processors. But it seems like the entire matrix is getting loaded in each of the processors instead of distributing it. What am I missing here? *code snippet:* Mat Js; MatType type = MATMPIAIJ; PetscViewer viewerJ; PetscCall(PetscViewerBinaryOpen(PETSC_COMM_WORLD, "Js.dat", FILE_MODE_READ, &viewerJ)); PetscCall(MatCreate(PETSC_COMM_WORLD, &Js)); PetscCall(MatSetSizes(Js, PETSC_DECIDE, PETSC_DECIDE, N, N)); PetscCall(MatSetType(Js, type)); PetscCall(MatLoad(Js, viewerJ)); PetscCall(PetscViewerDestroy(&viewerJ)); PetscCall(MatGetLocalSize(Js, &m, &n)); PetscCall(MatGetSize(Js, &M, &N)); PetscPrintf(PETSC_COMM_WORLD, "Js,Local rows: %d, Local columns: %d\n", m, n); *Output *of 'mpiexec -n 4 ./check': Js,Local rows: 480000, Local columns: 480000 Js,Local rows: 480000, Local columns: 480000 Js,Local rows: 480000, Local columns: 480000 Js,Local rows: 480000, Local columns: 480000 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Aug 27 08:04:51 2023 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 27 Aug 2023 08:04:51 -0500 Subject: [petsc-users] unable to distribute rows of a matrix across processors while loading In-Reply-To: References: Message-ID: On Sun, Aug 27, 2023 at 8:30?AM maitri ksh wrote: > Hi, > I am using MatSetSizes() followed by MatLoad() to distribute the rows of a > *sparse* matrix (*480000x480000*) across the processors. But it seems > like the entire matrix is getting loaded in each of the processors instead > of distributing it. What am I missing here? > > *code snippet:* > Mat Js; > MatType type = MATMPIAIJ; > PetscViewer viewerJ; > PetscCall(PetscViewerBinaryOpen(PETSC_COMM_WORLD, "Js.dat", > FILE_MODE_READ, &viewerJ)); > PetscCall(MatCreate(PETSC_COMM_WORLD, &Js)); > PetscCall(MatSetSizes(Js, PETSC_DECIDE, PETSC_DECIDE, N, N)); > PetscCall(MatSetType(Js, type)); > PetscCall(MatLoad(Js, viewerJ)); > PetscCall(PetscViewerDestroy(&viewerJ)); > PetscCall(MatGetLocalSize(Js, &m, &n)); > PetscCall(MatGetSize(Js, &M, &N)); > PetscPrintf(PETSC_COMM_WORLD, "Js,Local rows: %d, Local columns: > %d\n", m, n); > If this was really PETSC_COMM_WORLD, then this print statement would only output once, but you have 4 lines. Therefore, MPI is messed up. I am guessing you used an 'mpirun' or 'mpiexec' that is from a different MPI, and therefore the launch did not work correctly. If you had PETSc install MPI, then the correct one is in ${PETSC_DIR}/${PETSC_ARCH}/bin/mpiexec Thanks, Matt > > *Output *of 'mpiexec -n 4 ./check': > Js,Local rows: 480000, Local columns: 480000 > Js,Local rows: 480000, Local columns: 480000 > Js,Local rows: 480000, Local columns: 480000 > Js,Local rows: 480000, Local columns: 480000 > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Mon Aug 28 04:50:31 2023 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Mon, 28 Aug 2023 11:50:31 +0200 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> References: <68909B9F-A693-46FD-97B0-23AA5531E814@lip6.fr> <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> Message-ID: I've tried the new files, and with them, PCFIELDSPLIT now gets set up without crashes (but the setup is significantly slower than for MATAIJ) Unfortunately I still get errors later in the process: [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Pointer: Parameter # 1 [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-1023-ga6d78fcba1d GIT Date: 2023-08-22 20:32:33 -0400 [0]PETSC ERROR: Configure options -f --with-fortran-bindings=0 --with-cuda --with-cusp --download-scalapack --download-hdf5 --download-zlib --download-mumps --download-parmetis --download-metis --download-ptscotch --download-hypre --download-spai [0]PETSC ERROR: #1 PetscObjectQuery() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/sys/objects/inherit.c:742 [0]PETSC ERROR: #2 MatCreateSubMatrix_MPISBAIJ() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/mat/impls/sbaij/mpi/mpisbaij.c:1414 [0]PETSC ERROR: #3 MatCreateSubMatrix() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/mat/interface/matrix.c:8476 [0]PETSC ERROR: #4 PCSetUp_FieldSplit() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/pc/impls/fieldsplit/fieldsplit.c:826 [0]PETSC ERROR: #5 PCSetUp() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/pc/interface/precon.c:1069 [0]PETSC ERROR: #6 KSPSetUp() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/ksp/interface/itfunc.c:415 The code I'm running here works without any problems for MATAIJ. To run it with MATSBAIJ I've simply used the command-line option -dm_mat_type sbaij Kind regards, Carl-Johan On Sat, Aug 26, 2023 at 5:21?PM Pierre Jolivet via petsc-users < petsc-users at mcs.anl.gov> wrote: > > > On 27 Aug 2023, at 12:14 AM, Carl-Johan Thore > wrote: > > ?Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with > the same issue.? > I?m not sure I follow. Does PCFIELDSPLIT extract further submatrices from > these blocks, or is there > somewhere else in the code that things will go wrong? > > > Ah, no, you are right, in that case it should work. > > For the MATNEST I was thinking to get some savings from the block-symmetry > at least > even if symmetry in A00 and A11 cannot be exploited; using SBAIJ for them > would just be a > (pretty big) bonus. > > ?I?ll rebase on top of main and try to get it integrated if it could be > useful to you (but I?m traveling > right now so everything gets done more slowly, sorry).? > Sound great, Thanks again! > > > The MR is there https://gitlab.com/petsc/petsc/-/merge_requests/6841. > I need to add a new code path in MatCreateRedundantMatrix() to make sure > the resulting Mat is indeed SBAIJ, but that is orthogonal to the > PCFIELDSPLIT issue. > The branch should be usable in its current state. > > Thanks, > Pierre > > > *From:* Pierre Jolivet > *Sent:* Saturday, August 26, 2023 4:36 PM > *To:* Carl-Johan Thore > *Cc:* Carl-Johan Thore ; petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ > > > > > On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore > wrote: > > "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to > retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get > the A01 and A10 blocks in PCFIELDSPLIT. > I have a branch that fixes this, but I haven?t rebased in a while (and I?m > AFK right now), would you want me to rebase and give it a go, or must you > stick to a release tarball?" > > Ok, would be great if you could look at this! I don't need to stick to any > particular branch. > > Do you think MATNEST could be an alternative here? > > > Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with > the same issue. > I?m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was > crashing but I think it was thoroughly fixed in > https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/ > It is ugly code on top of ugly code, so I didn?t try to get it integrated > and just used the branch locally, and then moved to some other stuff. > I?ll rebase on top of main and try to get it integrated if it could be > useful to you (but I?m traveling right now so everything gets done more > slowly, sorry). > > Thanks, > Pierre > > > My matrix is > [A00 A01; > A01^t A11] > so perhaps with MATNEST I can make use of the block-symmetry at least, and > then use MATSBAIJ for > A00 and A11 if it's possible to combine matrix types which the manual > seems to imply. > > Kind regards > Carl-Johan > > > > On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore > wrote: > ? > Hi, > > I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. > According to the manual "[t]he fieldsplit preconditioner cannot > currently be used with the MATBAIJ or MATSBAIJ data formats if the > blocksize is larger than 1". Since my blocksize is exactly 1 it would seem > that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For > symmetric format, iscol must equal isrow" > from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in > fieldsplit.c at > > /* extract the A01 and A10 matrices */ ilink = jac->head; > PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if > (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, > ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { > PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, > MAT_INITIAL_MATRIX, &jac->B)); } > > This, since my A01 and A10 are not square, seems to explain why iscol is > not equal to isrow. > From this I gather that it is in general NOT possible to use > PCFIELDSPLIT with MATSBAIJ even with block size 1? > > Kind regards, > Carl-Johan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carljohanthore at gmail.com Mon Aug 28 04:52:49 2023 From: carljohanthore at gmail.com (Carl-Johan Thore) Date: Mon, 28 Aug 2023 11:52:49 +0200 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: References: <68909B9F-A693-46FD-97B0-23AA5531E814@lip6.fr> <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> Message-ID: " I would suggest avoiding using SBAIJ matrices, at least in the phase of application code development. We implemented SBAIJ for saving storage, not computational efficiency. SBAIJ does not have as many support as AIJ. After your code works for AIJ, then you may consider taking advantage of smaller storage of SBAIJ (could at cost of communication overhead)." Thanks for the advice. Our code is well-tested for AIJ. Is SBAIJ inherently significantly slower than AIJ, or is it just that it's not so much used and thus not as developed? On Sat, Aug 26, 2023 at 6:27?PM Zhang, Hong via petsc-users < petsc-users at mcs.anl.gov> wrote: > I would suggest avoiding using SBAIJ matrices, at least in the phase of > application code development. We implemented SBAIJ for saving storage, > not computational efficiency. SBAIJ does not have as many support as AIJ. > After your code works for AIJ, then you may consider taking advantage of > smaller storage of SBAIJ (could at cost of communication overhead). > Hong > ------------------------------ > *From:* petsc-users on behalf of Pierre > Jolivet via petsc-users > *Sent:* Saturday, August 26, 2023 10:20 AM > *To:* Carl-Johan Thore > *Cc:* petsc-users > *Subject:* Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ > > > > On 27 Aug 2023, at 12:14 AM, Carl-Johan Thore > wrote: > > ?Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with > the same issue.? > I?m not sure I follow. Does PCFIELDSPLIT extract further submatrices from > these blocks, or is there > somewhere else in the code that things will go wrong? > > > Ah, no, you are right, in that case it should work. > > For the MATNEST I was thinking to get some savings from the block-symmetry > at least > even if symmetry in A00 and A11 cannot be exploited; using SBAIJ for them > would just be a > (pretty big) bonus. > > ?I?ll rebase on top of main and try to get it integrated if it could be > useful to you (but I?m traveling > right now so everything gets done more slowly, sorry).? > Sound great, Thanks again! > > > The MR is there https://gitlab.com/petsc/petsc/-/merge_requests/6841. > I need to add a new code path in MatCreateRedundantMatrix() to make sure > the resulting Mat is indeed SBAIJ, but that is orthogonal to the > PCFIELDSPLIT issue. > The branch should be usable in its current state. > > Thanks, > Pierre > > > *From:* Pierre Jolivet > *Sent:* Saturday, August 26, 2023 4:36 PM > *To:* Carl-Johan Thore > *Cc:* Carl-Johan Thore ; petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ > > > > > On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore > wrote: > > "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to > retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get > the A01 and A10 blocks in PCFIELDSPLIT. > I have a branch that fixes this, but I haven?t rebased in a while (and I?m > AFK right now), would you want me to rebase and give it a go, or must you > stick to a release tarball?" > > Ok, would be great if you could look at this! I don't need to stick to any > particular branch. > > Do you think MATNEST could be an alternative here? > > > Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with > the same issue. > I?m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was > crashing but I think it was thoroughly fixed in > https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/ > It is ugly code on top of ugly code, so I didn?t try to get it integrated > and just used the branch locally, and then moved to some other stuff. > I?ll rebase on top of main and try to get it integrated if it could be > useful to you (but I?m traveling right now so everything gets done more > slowly, sorry). > > Thanks, > Pierre > > > My matrix is > [A00 A01; > A01^t A11] > so perhaps with MATNEST I can make use of the block-symmetry at least, and > then use MATSBAIJ for > A00 and A11 if it's possible to combine matrix types which the manual > seems to imply. > > Kind regards > Carl-Johan > > > > On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore > wrote: > ? > Hi, > > I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. > According to the manual "[t]he fieldsplit preconditioner cannot > currently be used with the MATBAIJ or MATSBAIJ data formats if the > blocksize is larger than 1". Since my blocksize is exactly 1 it would seem > that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For > symmetric format, iscol must equal isrow" > from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in > fieldsplit.c at > > /* extract the A01 and A10 matrices */ ilink = jac->head; > PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if > (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, > ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { > PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, > MAT_INITIAL_MATRIX, &jac->B)); } > > This, since my A01 and A10 are not square, seems to explain why iscol is > not equal to isrow. > From this I gather that it is in general NOT possible to use > PCFIELDSPLIT with MATSBAIJ even with block size 1? > > Kind regards, > Carl-Johan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.jolivet at lip6.fr Mon Aug 28 05:12:03 2023 From: pierre.jolivet at lip6.fr (Pierre Jolivet) Date: Mon, 28 Aug 2023 19:12:03 +0900 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: References: <68909B9F-A693-46FD-97B0-23AA5531E814@lip6.fr> <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> Message-ID: > On 28 Aug 2023, at 6:50 PM, Carl-Johan Thore wrote: > > I've tried the new files, and with them, PCFIELDSPLIT now gets set up without crashes (but the setup is significantly slower than for MATAIJ) I?ll be back from Japan at the end of this week, my schedule is too packed to get anything done in the meantime. But I?ll let you know when things are working properly (last I checked, I think it was working, but I may have forgotten about a corner case or two). But yes, though one would except things to be faster and less memory intensive with SBAIJ, it?s unfortunately not always the case. Thanks, Pierre > Unfortunately I still get errors later in the process: > > [0]PETSC ERROR: Null argument, when expecting valid pointer > [0]PETSC ERROR: Null Pointer: Parameter # 1 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-1023-ga6d78fcba1d GIT Date: 2023-08-22 20:32:33 -0400 > [0]PETSC ERROR: Configure options -f --with-fortran-bindings=0 --with-cuda --with-cusp --download-scalapack --download-hdf5 --download-zlib --download-mumps --download-parmetis --download-metis --download-ptscotch --download-hypre --download-spai > [0]PETSC ERROR: #1 PetscObjectQuery() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/sys/objects/inherit.c:742 > [0]PETSC ERROR: #2 MatCreateSubMatrix_MPISBAIJ() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/mat/impls/sbaij/mpi/mpisbaij.c:1414 > [0]PETSC ERROR: #3 MatCreateSubMatrix() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/mat/interface/matrix.c:8476 > [0]PETSC ERROR: #4 PCSetUp_FieldSplit() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/pc/impls/fieldsplit/fieldsplit.c:826 > [0]PETSC ERROR: #5 PCSetUp() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/pc/interface/precon.c:1069 > [0]PETSC ERROR: #6 KSPSetUp() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/ksp/interface/itfunc.c:415 > > The code I'm running here works without any problems for MATAIJ. To run it with MATSBAIJ I've simply used the command-line option > -dm_mat_type sbaij > > > Kind regards, > Carl-Johan > > > On Sat, Aug 26, 2023 at 5:21?PM Pierre Jolivet via petsc-users > wrote: >> >> >>> On 27 Aug 2023, at 12:14 AM, Carl-Johan Thore > wrote: >>> >>> ?Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue.? >>> I?m not sure I follow. Does PCFIELDSPLIT extract further submatrices from these blocks, or is there >>> somewhere else in the code that things will go wrong? >> >> Ah, no, you are right, in that case it should work. >> >>> For the MATNEST I was thinking to get some savings from the block-symmetry at least >>> even if symmetry in A00 and A11 cannot be exploited; using SBAIJ for them would just be a >>> (pretty big) bonus. >>> >>> ?I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling >>> right now so everything gets done more slowly, sorry).? >>> Sound great, Thanks again! >> >> The MR is there https://gitlab.com/petsc/petsc/-/merge_requests/6841. >> I need to add a new code path in MatCreateRedundantMatrix() to make sure the resulting Mat is indeed SBAIJ, but that is orthogonal to the PCFIELDSPLIT issue. >> The branch should be usable in its current state. >> >> Thanks, >> Pierre >> >>> >>> From: Pierre Jolivet > >>> Sent: Saturday, August 26, 2023 4:36 PM >>> To: Carl-Johan Thore > >>> Cc: Carl-Johan Thore >; petsc-users at mcs.anl.gov >>> Subject: Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ >>> >>> >>> >>> >>> On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore > wrote: >>> >>> "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT. >>> I have a branch that fixes this, but I haven?t rebased in a while (and I?m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball?" >>> >>> Ok, would be great if you could look at this! I don't need to stick to any particular branch. >>> >>> Do you think MATNEST could be an alternative here? >>> >>> Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue. >>> I?m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was crashing but I think it was thoroughly fixed in https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/ >>> It is ugly code on top of ugly code, so I didn?t try to get it integrated and just used the branch locally, and then moved to some other stuff. >>> I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling right now so everything gets done more slowly, sorry). >>> >>> Thanks, >>> Pierre >>> >>> >>> My matrix is >>> [A00 A01; >>> A01^t A11] >>> so perhaps with MATNEST I can make use of the block-symmetry at least, and then use MATSBAIJ for >>> A00 and A11 if it's possible to combine matrix types which the manual seems to imply. >>> >>> Kind regards >>> Carl-Johan >>> >>> >>> >>> On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore > wrote: >>> ? >>> Hi, >>> >>> I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. >>> According to the manual "[t]he fieldsplit preconditioner cannot >>> currently be used with the MATBAIJ or MATSBAIJ data formats if the >>> blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" >>> from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in >>> fieldsplit.c at >>> >>> /* extract the A01 and A10 matrices */ ilink = jac->head; >>> PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if >>> (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, >>> ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { >>> PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, >>> MAT_INITIAL_MATRIX, &jac->B)); } >>> >>> This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. >>> From this I gather that it is in general NOT possible to use >>> PCFIELDSPLIT with MATSBAIJ even with block size 1? >>> >>> Kind regards, >>> Carl-Johan >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Mon Aug 28 09:35:31 2023 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Mon, 28 Aug 2023 14:35:31 +0000 Subject: [petsc-users] PCFIELDSPLIT with MATSBAIJ In-Reply-To: References: <68909B9F-A693-46FD-97B0-23AA5531E814@lip6.fr> <009EB5B4-E65F-4486-8045-6CA5E20DBD94@lip6.fr> Message-ID: Carl-Johan, ________________________________ Thanks for the advice. Our code is well-tested for AIJ. Is SBAIJ inherently significantly slower than AIJ, or is it just that it's not so much used and thus not as developed? SBAIJ only stores upper half triangular part of matrix. When it needs a lower triangular part of entry, it has to jump around searching for that entry (column search instead of row accessing in AIJ), causing overhead for data-accessing. In parallel computation, it leads to extra inter-processor communication. Hong On Sat, Aug 26, 2023 at 6:27?PM Zhang, Hong via petsc-users > wrote: I would suggest avoiding using SBAIJ matrices, at least in the phase of application code development. We implemented SBAIJ for saving storage, not computational efficiency. SBAIJ does not have as many support as AIJ. After your code works for AIJ, then you may consider taking advantage of smaller storage of SBAIJ (could at cost of communication overhead). Hong ________________________________ From: petsc-users > on behalf of Pierre Jolivet via petsc-users > Sent: Saturday, August 26, 2023 10:20 AM To: Carl-Johan Thore > Cc: petsc-users > Subject: Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ On 27 Aug 2023, at 12:14 AM, Carl-Johan Thore > wrote: ?Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue.? I?m not sure I follow. Does PCFIELDSPLIT extract further submatrices from these blocks, or is there somewhere else in the code that things will go wrong? Ah, no, you are right, in that case it should work. For the MATNEST I was thinking to get some savings from the block-symmetry at least even if symmetry in A00 and A11 cannot be exploited; using SBAIJ for them would just be a (pretty big) bonus. ?I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling right now so everything gets done more slowly, sorry).? Sound great, Thanks again! The MR is there https://gitlab.com/petsc/petsc/-/merge_requests/6841. I need to add a new code path in MatCreateRedundantMatrix() to make sure the resulting Mat is indeed SBAIJ, but that is orthogonal to the PCFIELDSPLIT issue. The branch should be usable in its current state. Thanks, Pierre From: Pierre Jolivet > Sent: Saturday, August 26, 2023 4:36 PM To: Carl-Johan Thore > Cc: Carl-Johan Thore >; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore > wrote: "(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT. I have a branch that fixes this, but I haven?t rebased in a while (and I?m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball?" Ok, would be great if you could look at this! I don't need to stick to any particular branch. Do you think MATNEST could be an alternative here? Well, your A00 and A11 will possibly be SBAIJ also, so you?ll end up with the same issue. I?m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was crashing but I think it was thoroughly fixed in https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/ It is ugly code on top of ugly code, so I didn?t try to get it integrated and just used the branch locally, and then moved to some other stuff. I?ll rebase on top of main and try to get it integrated if it could be useful to you (but I?m traveling right now so everything gets done more slowly, sorry). Thanks, Pierre My matrix is [A00 A01; A01^t A11] so perhaps with MATNEST I can make use of the block-symmetry at least, and then use MATSBAIJ for A00 and A11 if it's possible to combine matrix types which the manual seems to imply. Kind regards Carl-Johan On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore > wrote: ? Hi, I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4. According to the manual "[t]he fieldsplit preconditioner cannot currently be used with the MATBAIJ or MATSBAIJ data formats if the blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow" from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in fieldsplit.c at /* extract the A01 and A10 matrices */ ilink = jac->head; PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if (jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else { PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow. From this I gather that it is in general NOT possible to use PCFIELDSPLIT with MATSBAIJ even with block size 1? Kind regards, Carl-Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: From inturu.srinivas2020 at vitstudent.ac.in Mon Aug 28 10:59:57 2023 From: inturu.srinivas2020 at vitstudent.ac.in (INTURU SRINIVAS 20PHD0548) Date: Mon, 28 Aug 2023 21:29:57 +0530 Subject: [petsc-users] Error while building PETSc with MATLAB Message-ID: Hello, I want to build PETSc with MATLAB for working on the simulation using IBAMR open software. While building the PETSc, using the following export PETSC_DIR=$PWD export PETSC_ARCH=linux-debug ./configure \ --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ --with-debugging=1 \ --download-hypre=1 \ --download-fblaslapack=1 \ --with-x=0 \ --with-matlab-dir=/usr/local/MATLAB/R2020b/ --with-matlab-engine=1 --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ make -j4 make -j4 test I got the following error CLINKER linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test /usr/bin/ld: linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in function `EvaluateResidual': /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: undefined reference to `PetscMatlabEnginePut' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: undefined reference to `PetscMatlabEngineEvaluate' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: undefined reference to `PetscMatlabEngineGet' /usr/bin/ld: linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in function `EvaluateJacobian': /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: undefined reference to `PetscMatlabEnginePut' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: undefined reference to `PetscMatlabEngineEvaluate' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: undefined reference to `PetscMatlabEngineGet' /usr/bin/ld: linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in function `TaoPounders': /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: undefined reference to `PetscMatlabEngineGet' /usr/bin/ld: linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in function `main': /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: undefined reference to `PetscMatlabEngineCreate' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: undefined reference to `PetscMatlabEngineEvaluate' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: undefined reference to `PetscMatlabEngineEvaluate' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: undefined reference to `PetscMatlabEngineGetArray' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: undefined reference to `PetscMatlabEngineGetArray' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: undefined reference to `PetscMatlabEngineGetArray' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: undefined reference to `PetscMatlabEngineGetArray' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: undefined reference to `PetscMatlabEngineGetArray' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: undefined reference to `PetscMatlabEngineEvaluate' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: undefined reference to `PetscMatlabEngineEvaluate' /usr/bin/ld: /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: undefined reference to `PetscMatlabEngineDestroy' collect2: error: ld returned 1 exit status make: *** [gmakefile.test:185: linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] Error 1 make: *** Waiting for unfinished jobs.... Please help me to solve this issue Thank you Srinivas -- **Disclaimer:* This message was sent from Vellore Institute of Technology. ? The contents of this email may contain legally protected confidential or privileged information of ?Vellore Institute of Technology?.? If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. If you have received this email in error, please promptly notify the sender by reply email and delete the original email and any backup copies without reading them.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 28 11:13:29 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Aug 2023 11:13:29 -0500 (CDT) Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: References: Message-ID: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> https://ibamr.github.io/linux says petsc-3.17 Here you are using 3.13 Can you retry with petsc-3.17.5? Satish On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > Hello, > > I want to build PETSc with MATLAB for working on the simulation using IBAMR > open software. While building the PETSc, using the following > > export PETSC_DIR=$PWD > export PETSC_ARCH=linux-debug > ./configure \ > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > --with-debugging=1 \ > --download-hypre=1 \ > --download-fblaslapack=1 \ > --with-x=0 \ > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > --with-matlab-engine=1 > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > make -j4 > make -j4 test > > I got the following error > CLINKER linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > /usr/bin/ld: > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > function `EvaluateResidual': > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > undefined reference to `PetscMatlabEnginePut' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > undefined reference to `PetscMatlabEngineEvaluate' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > undefined reference to `PetscMatlabEngineGet' > /usr/bin/ld: > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > function `EvaluateJacobian': > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > undefined reference to `PetscMatlabEnginePut' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > undefined reference to `PetscMatlabEngineEvaluate' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > undefined reference to `PetscMatlabEngineGet' > /usr/bin/ld: > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > function `TaoPounders': > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > undefined reference to `PetscMatlabEngineGet' > /usr/bin/ld: > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > function `main': > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > undefined reference to `PetscMatlabEngineCreate' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > undefined reference to `PetscMatlabEngineEvaluate' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > undefined reference to `PetscMatlabEngineEvaluate' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > undefined reference to `PetscMatlabEngineGetArray' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > undefined reference to `PetscMatlabEngineGetArray' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > undefined reference to `PetscMatlabEngineGetArray' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > undefined reference to `PetscMatlabEngineGetArray' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > undefined reference to `PetscMatlabEngineGetArray' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > undefined reference to `PetscMatlabEngineEvaluate' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > undefined reference to `PetscMatlabEngineEvaluate' > /usr/bin/ld: > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > undefined reference to `PetscMatlabEngineDestroy' > > collect2: error: ld returned 1 exit status > make: *** [gmakefile.test:185: > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] Error 1 > make: *** Waiting for unfinished jobs.... > > Please help me to solve this issue > > Thank you > Srinivas > > From inturu.srinivas2020 at vitstudent.ac.in Mon Aug 28 11:14:35 2023 From: inturu.srinivas2020 at vitstudent.ac.in (INTURU SRINIVAS 20PHD0548) Date: Mon, 28 Aug 2023 21:44:35 +0530 Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> Message-ID: I will try it. On Mon, Aug 28, 2023, 21:43 Satish Balay wrote: > https://ibamr.github.io/linux says petsc-3.17 > > Here you are using 3.13 > > Can you retry with petsc-3.17.5? > > Satish > > On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > Hello, > > > > I want to build PETSc with MATLAB for working on the simulation using > IBAMR > > open software. While building the PETSc, using the following > > > > export PETSC_DIR=$PWD > > export PETSC_ARCH=linux-debug > > ./configure \ > > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > > --with-debugging=1 \ > > --download-hypre=1 \ > > --download-fblaslapack=1 \ > > --with-x=0 \ > > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > > --with-matlab-engine=1 > > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > > > make -j4 > > make -j4 test > > > > I got the following error > > CLINKER > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `EvaluateResidual': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > > undefined reference to `PetscMatlabEnginePut' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `EvaluateJacobian': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > > undefined reference to `PetscMatlabEnginePut' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `TaoPounders': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `main': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > > undefined reference to `PetscMatlabEngineCreate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > > undefined reference to `PetscMatlabEngineDestroy' > > > > collect2: error: ld returned 1 exit status > > make: *** [gmakefile.test:185: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] > Error 1 > > make: *** Waiting for unfinished jobs.... > > > > Please help me to solve this issue > > > > Thank you > > Srinivas > > > > > > -- **Disclaimer:* This message was sent from Vellore Institute of Technology. ? The contents of this email may contain legally protected confidential or privileged information of ?Vellore Institute of Technology?.? If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. If you have received this email in error, please promptly notify the sender by reply email and delete the original email and any backup copies without reading them.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 28 11:15:33 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 28 Aug 2023 11:15:33 -0500 (CDT) Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> Message-ID: <0c625f65-591a-e26f-6f3e-bde3e0b29ca5@mcs.anl.gov> Also - the instructions don't say if matlab is required. So perhaps you might want to try an install without matlab - and see if you are able to get IBAMR working. Satish On Mon, 28 Aug 2023, Satish Balay via petsc-users wrote: > https://ibamr.github.io/linux says petsc-3.17 > > Here you are using 3.13 > > Can you retry with petsc-3.17.5? > > Satish > > On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > Hello, > > > > I want to build PETSc with MATLAB for working on the simulation using IBAMR > > open software. While building the PETSc, using the following > > > > export PETSC_DIR=$PWD > > export PETSC_ARCH=linux-debug > > ./configure \ > > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > > --with-debugging=1 \ > > --download-hypre=1 \ > > --download-fblaslapack=1 \ > > --with-x=0 \ > > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > > --with-matlab-engine=1 > > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > > > make -j4 > > make -j4 test > > > > I got the following error > > CLINKER linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `EvaluateResidual': > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > > undefined reference to `PetscMatlabEnginePut' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `EvaluateJacobian': > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > > undefined reference to `PetscMatlabEnginePut' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `TaoPounders': > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `main': > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > > undefined reference to `PetscMatlabEngineCreate' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > > undefined reference to `PetscMatlabEngineDestroy' > > > > collect2: error: ld returned 1 exit status > > make: *** [gmakefile.test:185: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] Error 1 > > make: *** Waiting for unfinished jobs.... > > > > Please help me to solve this issue > > > > Thank you > > Srinivas > > > > > From inturu.srinivas2020 at vitstudent.ac.in Mon Aug 28 11:21:18 2023 From: inturu.srinivas2020 at vitstudent.ac.in (INTURU SRINIVAS 20PHD0548) Date: Mon, 28 Aug 2023 21:51:18 +0530 Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: <0c625f65-591a-e26f-6f3e-bde3e0b29ca5@mcs.anl.gov> References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> <0c625f65-591a-e26f-6f3e-bde3e0b29ca5@mcs.anl.gov> Message-ID: For the past 6 months,I am working on IBAMR without MATLAB. Now for one application it is recommended to build PETSc with MATLAB as mentioned in the following link https://github.com/IBAMR/cfd-mpc-wecs. On Mon, Aug 28, 2023, 21:45 Satish Balay wrote: > Also - the instructions don't say if matlab is required. > > So perhaps you might want to try an install without matlab - and see if > you are able to get IBAMR working. > > Satish > > On Mon, 28 Aug 2023, Satish Balay via petsc-users wrote: > > > https://ibamr.github.io/linux says petsc-3.17 > > > > Here you are using 3.13 > > > > Can you retry with petsc-3.17.5? > > > > Satish > > > > On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > > > Hello, > > > > > > I want to build PETSc with MATLAB for working on the simulation using > IBAMR > > > open software. While building the PETSc, using the following > > > > > > export PETSC_DIR=$PWD > > > export PETSC_ARCH=linux-debug > > > ./configure \ > > > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > > > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > > > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > > > --with-debugging=1 \ > > > --download-hypre=1 \ > > > --download-fblaslapack=1 \ > > > --with-x=0 \ > > > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > > > --with-matlab-engine=1 > > > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > > > > > make -j4 > > > make -j4 test > > > > > > I got the following error > > > CLINKER > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > > > /usr/bin/ld: > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: > in > > > function `EvaluateResidual': > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > > > undefined reference to `PetscMatlabEnginePut' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > > > undefined reference to `PetscMatlabEngineEvaluate' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > > > undefined reference to `PetscMatlabEngineGet' > > > /usr/bin/ld: > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: > in > > > function `EvaluateJacobian': > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > > > undefined reference to `PetscMatlabEnginePut' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > > > undefined reference to `PetscMatlabEngineEvaluate' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > > > undefined reference to `PetscMatlabEngineGet' > > > /usr/bin/ld: > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: > in > > > function `TaoPounders': > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > > > undefined reference to `PetscMatlabEngineGet' > > > /usr/bin/ld: > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: > in > > > function `main': > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > > > undefined reference to `PetscMatlabEngineCreate' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > > > undefined reference to `PetscMatlabEngineEvaluate' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > > > undefined reference to `PetscMatlabEngineEvaluate' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > > > undefined reference to `PetscMatlabEngineGetArray' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > > > undefined reference to `PetscMatlabEngineGetArray' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > > > undefined reference to `PetscMatlabEngineGetArray' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > > > undefined reference to `PetscMatlabEngineGetArray' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > > > undefined reference to `PetscMatlabEngineGetArray' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > > > undefined reference to `PetscMatlabEngineEvaluate' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > > > undefined reference to `PetscMatlabEngineEvaluate' > > > /usr/bin/ld: > > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > > > undefined reference to `PetscMatlabEngineDestroy' > > > > > > collect2: error: ld returned 1 exit status > > > make: *** [gmakefile.test:185: > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] > Error 1 > > > make: *** Waiting for unfinished jobs.... > > > > > > Please help me to solve this issue > > > > > > Thank you > > > Srinivas > > > > > > > > > > -- **Disclaimer:* This message was sent from Vellore Institute of Technology. ? The contents of this email may contain legally protected confidential or privileged information of ?Vellore Institute of Technology?.? If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. If you have received this email in error, please promptly notify the sender by reply email and delete the original email and any backup copies without reading them.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From inturu.srinivas2020 at vitstudent.ac.in Mon Aug 28 23:49:34 2023 From: inturu.srinivas2020 at vitstudent.ac.in (INTURU SRINIVAS 20PHD0548) Date: Tue, 29 Aug 2023 10:19:34 +0530 Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> Message-ID: I tried with petsc-3.17.5. During building of libmesh, the error shows petsc was not found On Mon, Aug 28, 2023 at 9:43?PM Satish Balay wrote: > https://ibamr.github.io/linux says petsc-3.17 > > Here you are using 3.13 > > Can you retry with petsc-3.17.5? > > Satish > > On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > Hello, > > > > I want to build PETSc with MATLAB for working on the simulation using > IBAMR > > open software. While building the PETSc, using the following > > > > export PETSC_DIR=$PWD > > export PETSC_ARCH=linux-debug > > ./configure \ > > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > > --with-debugging=1 \ > > --download-hypre=1 \ > > --download-fblaslapack=1 \ > > --with-x=0 \ > > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > > --with-matlab-engine=1 > > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > > > make -j4 > > make -j4 test > > > > I got the following error > > CLINKER > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `EvaluateResidual': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > > undefined reference to `PetscMatlabEnginePut' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `EvaluateJacobian': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > > undefined reference to `PetscMatlabEnginePut' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `TaoPounders': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > > undefined reference to `PetscMatlabEngineGet' > > /usr/bin/ld: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > function `main': > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > > undefined reference to `PetscMatlabEngineCreate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > > undefined reference to `PetscMatlabEngineGetArray' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > > undefined reference to `PetscMatlabEngineEvaluate' > > /usr/bin/ld: > > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > > undefined reference to `PetscMatlabEngineDestroy' > > > > collect2: error: ld returned 1 exit status > > make: *** [gmakefile.test:185: > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] > Error 1 > > make: *** Waiting for unfinished jobs.... > > > > Please help me to solve this issue > > > > Thank you > > Srinivas > > > > > > -- **Disclaimer:* This message was sent from Vellore Institute of Technology. ? The contents of this email may contain legally protected confidential or privileged information of ?Vellore Institute of Technology?.? If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. If you have received this email in error, please promptly notify the sender by reply email and delete the original email and any backup copies without reading them.* -------------- next part -------------- An HTML attachment was scrubbed... URL: From inturu.srinivas2020 at vitstudent.ac.in Mon Aug 28 23:52:17 2023 From: inturu.srinivas2020 at vitstudent.ac.in (INTURU SRINIVAS 20PHD0548) Date: Tue, 29 Aug 2023 10:22:17 +0530 Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> Message-ID: I am sharing the make.log file while building petsc-3.13.4 with Matlab. Please find the attachment and do the needful. On Tue, Aug 29, 2023 at 10:19?AM INTURU SRINIVAS 20PHD0548 < inturu.srinivas2020 at vitstudent.ac.in> wrote: > I tried with petsc-3.17.5. During building of libmesh, the error shows > petsc was not found > > On Mon, Aug 28, 2023 at 9:43?PM Satish Balay wrote: > >> https://ibamr.github.io/linux says petsc-3.17 >> >> Here you are using 3.13 >> >> Can you retry with petsc-3.17.5? >> >> Satish >> >> On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: >> >> > Hello, >> > >> > I want to build PETSc with MATLAB for working on the simulation using >> IBAMR >> > open software. While building the PETSc, using the following >> > >> > export PETSC_DIR=$PWD >> > export PETSC_ARCH=linux-debug >> > ./configure \ >> > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ >> > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ >> > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ >> > --with-debugging=1 \ >> > --download-hypre=1 \ >> > --download-fblaslapack=1 \ >> > --with-x=0 \ >> > --with-matlab-dir=/usr/local/MATLAB/R2020b/ >> > --with-matlab-engine=1 >> > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ >> > >> > make -j4 >> > make -j4 test >> > >> > I got the following error >> > CLINKER >> linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test >> > /usr/bin/ld: >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in >> > function `EvaluateResidual': >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: >> > undefined reference to `PetscMatlabEnginePut' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: >> > undefined reference to `PetscMatlabEngineEvaluate' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: >> > undefined reference to `PetscMatlabEngineGet' >> > /usr/bin/ld: >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in >> > function `EvaluateJacobian': >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: >> > undefined reference to `PetscMatlabEnginePut' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: >> > undefined reference to `PetscMatlabEngineEvaluate' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: >> > undefined reference to `PetscMatlabEngineGet' >> > /usr/bin/ld: >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in >> > function `TaoPounders': >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: >> > undefined reference to `PetscMatlabEngineGet' >> > /usr/bin/ld: >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in >> > function `main': >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: >> > undefined reference to `PetscMatlabEngineCreate' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: >> > undefined reference to `PetscMatlabEngineEvaluate' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: >> > undefined reference to `PetscMatlabEngineEvaluate' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: >> > undefined reference to `PetscMatlabEngineGetArray' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: >> > undefined reference to `PetscMatlabEngineGetArray' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: >> > undefined reference to `PetscMatlabEngineGetArray' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: >> > undefined reference to `PetscMatlabEngineGetArray' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: >> > undefined reference to `PetscMatlabEngineGetArray' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: >> > undefined reference to `PetscMatlabEngineEvaluate' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: >> > undefined reference to `PetscMatlabEngineEvaluate' >> > /usr/bin/ld: >> > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: >> > undefined reference to `PetscMatlabEngineDestroy' >> > >> > collect2: error: ld returned 1 exit status >> > make: *** [gmakefile.test:185: >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] >> Error 1 >> > make: *** Waiting for unfinished jobs.... >> > >> > Please help me to solve this issue >> > >> > Thank you >> > Srinivas >> > >> > >> >> -- **Disclaimer:* This message was sent from Vellore Institute of Technology. ? The contents of this email may contain legally protected confidential or privileged information of ?Vellore Institute of Technology?.? If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. If you have received this email in error, please promptly notify the sender by reply email and delete the original email and any backup copies without reading them.* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 97721 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Aug 29 09:01:18 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 29 Aug 2023 09:01:18 -0500 (CDT) Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> Message-ID: <47c457d8-6b74-dee2-f061-9962e50b133f@mcs.anl.gov> Send configure.log, make.log from both petsc-3.13 and 3.17 [or 3.19]. [you can gzip them to make the logs friendly to mailing list - or send them to petsc-maint] And does test suite work with 3.17? [or 3.19?] Satish On Tue, 29 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > I am sharing the make.log file while building petsc-3.13.4 with Matlab. > Please find the attachment and do the needful. > > On Tue, Aug 29, 2023 at 10:19?AM INTURU SRINIVAS 20PHD0548 < > inturu.srinivas2020 at vitstudent.ac.in> wrote: > > > I tried with petsc-3.17.5. During building of libmesh, the error shows > > petsc was not found > > > > On Mon, Aug 28, 2023 at 9:43?PM Satish Balay wrote: > > > >> https://ibamr.github.io/linux says petsc-3.17 > >> > >> Here you are using 3.13 > >> > >> Can you retry with petsc-3.17.5? > >> > >> Satish > >> > >> On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > >> > >> > Hello, > >> > > >> > I want to build PETSc with MATLAB for working on the simulation using > >> IBAMR > >> > open software. While building the PETSc, using the following > >> > > >> > export PETSC_DIR=$PWD > >> > export PETSC_ARCH=linux-debug > >> > ./configure \ > >> > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > >> > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > >> > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > >> > --with-debugging=1 \ > >> > --download-hypre=1 \ > >> > --download-fblaslapack=1 \ > >> > --with-x=0 \ > >> > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > >> > --with-matlab-engine=1 > >> > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > >> > > >> > make -j4 > >> > make -j4 test > >> > > >> > I got the following error > >> > CLINKER > >> linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > >> > /usr/bin/ld: > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > >> > function `EvaluateResidual': > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > >> > undefined reference to `PetscMatlabEnginePut' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > >> > undefined reference to `PetscMatlabEngineEvaluate' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > >> > undefined reference to `PetscMatlabEngineGet' > >> > /usr/bin/ld: > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > >> > function `EvaluateJacobian': > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > >> > undefined reference to `PetscMatlabEnginePut' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > >> > undefined reference to `PetscMatlabEngineEvaluate' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > >> > undefined reference to `PetscMatlabEngineGet' > >> > /usr/bin/ld: > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > >> > function `TaoPounders': > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > >> > undefined reference to `PetscMatlabEngineGet' > >> > /usr/bin/ld: > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > >> > function `main': > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > >> > undefined reference to `PetscMatlabEngineCreate' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > >> > undefined reference to `PetscMatlabEngineEvaluate' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > >> > undefined reference to `PetscMatlabEngineEvaluate' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > >> > undefined reference to `PetscMatlabEngineGetArray' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > >> > undefined reference to `PetscMatlabEngineGetArray' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > >> > undefined reference to `PetscMatlabEngineGetArray' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > >> > undefined reference to `PetscMatlabEngineGetArray' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > >> > undefined reference to `PetscMatlabEngineGetArray' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > >> > undefined reference to `PetscMatlabEngineEvaluate' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > >> > undefined reference to `PetscMatlabEngineEvaluate' > >> > /usr/bin/ld: > >> > > >> /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > >> > undefined reference to `PetscMatlabEngineDestroy' > >> > > >> > collect2: error: ld returned 1 exit status > >> > make: *** [gmakefile.test:185: > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] > >> Error 1 > >> > make: *** Waiting for unfinished jobs.... > >> > > >> > Please help me to solve this issue > >> > > >> > Thank you > >> > Srinivas > >> > > >> > > >> > >> > > From thanasis.boutsikakis at corintis.com Tue Aug 29 11:50:36 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Tue, 29 Aug 2023 18:50:36 +0200 Subject: [petsc-users] Orthogonalization of a (sparse) PETSc matrix Message-ID: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> Hi all, I have the following code that orthogonalizes a PETSc matrix. The problem is that this implementation requires that the PETSc matrix is dense, otherwise, it fails at bv.SetFromOptions(). Hence the assert in orthogonality(). What could I do in order to be able to orthogonalize sparse matrices as well? Could I convert it efficiently? (I tried to no avail) Thanks! """Experimenting with matrix orthogonalization""" import contextlib import sys import time import numpy as np from firedrake import COMM_WORLD from firedrake.petsc import PETSc import slepc4py slepc4py.init(sys.argv) from slepc4py import SLEPc from numpy.testing import assert_array_almost_equal EPSILON_USER = 1e-4 EPS = sys.float_info.epsilon def Print(message: str): """Print function that prints only on rank 0 with color Args: message (str): message to be printed """ PETSc.Sys.Print(message) def create_petsc_matrix(input_array, sparse=True): """Create a PETSc matrix from an input_array Args: input_array (np array): Input array partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. sparse (bool, optional): Toggle for sparese or dense. Defaults to True. Returns: PETSc mat: PETSc matrix """ # Check if input_array is 1D and reshape if necessary assert len(input_array.shape) == 2, "Input array should be 2-dimensional" global_rows, global_cols = input_array.shape size = ((None, global_rows), (global_cols, global_cols)) # Create a sparse or dense matrix based on the 'sparse' argument if sparse: matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) else: matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) matrix.setUp() local_rows_start, local_rows_end = matrix.getOwnershipRange() for counter, i in enumerate(range(local_rows_start, local_rows_end)): # Calculate the correct row in the array for the current process row_in_array = counter + local_rows_start matrix.setValues( i, range(global_cols), input_array[row_in_array, :], addv=False ) # Assembly the matrix to compute the final structure matrix.assemblyBegin() matrix.assemblyEnd() return matrix def orthogonality(A): # sourcery skip: avoid-builtin-shadow """Checking and correcting orthogonality Args: A (PETSc.Mat): Matrix of size [m x k]. Returns: PETSc.Mat: Matrix of size [m x k]. """ # Check if the matrix is dense mat_type = A.getType() assert mat_type in ( "seqdense", "mpidense", ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a dense matrix." m, k = A.getSize() Phi1 = A.getColumnVector(0) Phi2 = A.getColumnVector(k - 1) # Compute dot product using PETSc function dot_product = Phi1.dot(Phi2) if abs(dot_product) > min(EPSILON_USER, EPS * m): Print(" Matrix is not orthogonal") # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL _type = SLEPc.BV().OrthogBlockType.GS bv = SLEPc.BV().createFromMat(A) bv.setFromOptions() bv.setOrthogonalization(_type) bv.orthogonalize() A = bv.createMat() Print(" Matrix successfully orthogonalized") # # Assembly the matrix to compute the final structure if not A.assembled: A.assemblyBegin() A.assemblyEnd() else: Print(" Matrix is orthogonal") return A # -------------------------------------------- # EXP: Orthogonalization of an mpi PETSc matrix # -------------------------------------------- m, k = 11, 7 # Generate the random numpy matrices np.random.seed(0) # sets the seed to 0 A_np = np.random.randint(low=0, high=6, size=(m, k)) A = create_petsc_matrix(A_np, sparse=False) A_orthogonal = orthogonality(A) # -------------------------------------------- # TEST: Orthogonalization of a numpy matrix # -------------------------------------------- # Generate A_np_orthogonal A_np_orthogonal, _ = np.linalg.qr(A_np) # Get the local values from A_orthogonal local_rows_start, local_rows_end = A_orthogonal.getOwnershipRange() A_orthogonal_local = A_orthogonal.getValues( range(local_rows_start, local_rows_end), range(k) ) # Assert the correctness of the multiplication for the local subset assert_array_almost_equal( np.abs(A_orthogonal_local), np.abs(A_np_orthogonal[local_rows_start:local_rows_end, :]), decimal=5, ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From inturu.srinivas2020 at vitstudent.ac.in Tue Aug 29 12:10:45 2023 From: inturu.srinivas2020 at vitstudent.ac.in (INTURU SRINIVAS 20PHD0548) Date: Tue, 29 Aug 2023 22:40:45 +0530 Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: <47c457d8-6b74-dee2-f061-9962e50b133f@mcs.anl.gov> References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> <47c457d8-6b74-dee2-f061-9962e50b133f@mcs.anl.gov> Message-ID: I am sharing the log files while building petsc3.13.4 with matlab and also the log file while building libmesh with petsc3.17.5 and matlab. Building petsc 3.17.5 with matlab was done successfully. But libmesh is not able to find the petsc Please find the attachments. On Tue, Aug 29, 2023 at 7:31?PM Satish Balay wrote: > Send configure.log, make.log from both petsc-3.13 and 3.17 [or 3.19]. > > [you can gzip them to make the logs friendly to mailing list - or send > them to petsc-maint] > > And does test suite work with 3.17? [or 3.19?] > > Satish > > On Tue, 29 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > I am sharing the make.log file while building petsc-3.13.4 with Matlab. > > Please find the attachment and do the needful. > > > > On Tue, Aug 29, 2023 at 10:19?AM INTURU SRINIVAS 20PHD0548 < > > inturu.srinivas2020 at vitstudent.ac.in> wrote: > > > > > I tried with petsc-3.17.5. During building of libmesh, the error shows > > > petsc was not found > > > > > > On Mon, Aug 28, 2023 at 9:43?PM Satish Balay > wrote: > > > > > >> https://ibamr.github.io/linux says petsc-3.17 > > >> > > >> Here you are using 3.13 > > >> > > >> Can you retry with petsc-3.17.5? > > >> > > >> Satish > > >> > > >> On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > >> > > >> > Hello, > > >> > > > >> > I want to build PETSc with MATLAB for working on the simulation > using > > >> IBAMR > > >> > open software. While building the PETSc, using the following > > >> > > > >> > export PETSC_DIR=$PWD > > >> > export PETSC_ARCH=linux-debug > > >> > ./configure \ > > >> > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > > >> > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > > >> > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > > >> > --with-debugging=1 \ > > >> > --download-hypre=1 \ > > >> > --download-fblaslapack=1 \ > > >> > --with-x=0 \ > > >> > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > > >> > --with-matlab-engine=1 > > >> > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > >> > > > >> > make -j4 > > >> > make -j4 test > > >> > > > >> > I got the following error > > >> > CLINKER > > >> linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `EvaluateResidual': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > > >> > undefined reference to `PetscMatlabEnginePut' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > > >> > undefined reference to `PetscMatlabEngineGet' > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `EvaluateJacobian': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > > >> > undefined reference to `PetscMatlabEnginePut' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > > >> > undefined reference to `PetscMatlabEngineGet' > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `TaoPounders': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > > >> > undefined reference to `PetscMatlabEngineGet' > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `main': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > > >> > undefined reference to `PetscMatlabEngineCreate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > > >> > undefined reference to `PetscMatlabEngineDestroy' > > >> > > > >> > collect2: error: ld returned 1 exit status > > >> > make: *** [gmakefile.test:185: > > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] > > >> Error 1 > > >> > make: *** Waiting for unfinished jobs.... > > >> > > > >> > Please help me to solve this issue > > >> > > > >> > Thank you > > >> > Srinivas > > >> > > > >> > > > >> > > >> > > > > > -- **Disclaimer:* This message was sent from Vellore Institute of Technology. ? The contents of this email may contain legally protected confidential or privileged information of ?Vellore Institute of Technology?.? If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. If you have received this email in error, please promptly notify the sender by reply email and delete the original email and any backup copies without reading them.* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.13.4_make.log.gz Type: application/gzip Size: 12577 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.17.5_config.log.gz Type: application/gzip Size: 19502 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 3.13.4_configure.log.gz Type: application/gzip Size: 106072 bytes Desc: not available URL: From knepley at gmail.com Tue Aug 29 12:07:46 2023 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 29 Aug 2023 12:07:46 -0500 Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: <47c457d8-6b74-dee2-f061-9962e50b133f@mcs.anl.gov> References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> <47c457d8-6b74-dee2-f061-9962e50b133f@mcs.anl.gov> Message-ID: On Tue, Aug 29, 2023 at 9:08?AM Satish Balay via petsc-users < petsc-users at mcs.anl.gov> wrote: > Send configure.log, make.log from both petsc-3.13 and 3.17 [or 3.19]. > > [you can gzip them to make the logs friendly to mailing list - or send > them to petsc-maint] > > And does test suite work with 3.17? [or 3.19?] > David Wells is working on this. The change is that petscversion.h now includes petscconf.h which means you need all the include flags, but Libmesh does not get the flags right. Thanks, Matt > Satish > > On Tue, 29 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > I am sharing the make.log file while building petsc-3.13.4 with Matlab. > > Please find the attachment and do the needful. > > > > On Tue, Aug 29, 2023 at 10:19?AM INTURU SRINIVAS 20PHD0548 < > > inturu.srinivas2020 at vitstudent.ac.in> wrote: > > > > > I tried with petsc-3.17.5. During building of libmesh, the error shows > > > petsc was not found > > > > > > On Mon, Aug 28, 2023 at 9:43?PM Satish Balay > wrote: > > > > > >> https://ibamr.github.io/linux says petsc-3.17 > > >> > > >> Here you are using 3.13 > > >> > > >> Can you retry with petsc-3.17.5? > > >> > > >> Satish > > >> > > >> On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > >> > > >> > Hello, > > >> > > > >> > I want to build PETSc with MATLAB for working on the simulation > using > > >> IBAMR > > >> > open software. While building the PETSc, using the following > > >> > > > >> > export PETSC_DIR=$PWD > > >> > export PETSC_ARCH=linux-debug > > >> > ./configure \ > > >> > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > > >> > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > > >> > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > > >> > --with-debugging=1 \ > > >> > --download-hypre=1 \ > > >> > --download-fblaslapack=1 \ > > >> > --with-x=0 \ > > >> > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > > >> > --with-matlab-engine=1 > > >> > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > >> > > > >> > make -j4 > > >> > make -j4 test > > >> > > > >> > I got the following error > > >> > CLINKER > > >> linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `EvaluateResidual': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > > >> > undefined reference to `PetscMatlabEnginePut' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > > >> > undefined reference to `PetscMatlabEngineGet' > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `EvaluateJacobian': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > > >> > undefined reference to `PetscMatlabEnginePut' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > > >> > undefined reference to `PetscMatlabEngineGet' > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `TaoPounders': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > > >> > undefined reference to `PetscMatlabEngineGet' > > >> > /usr/bin/ld: > > >> > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > >> > function `main': > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > > >> > undefined reference to `PetscMatlabEngineCreate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > > >> > undefined reference to `PetscMatlabEngineGetArray' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > >> > /usr/bin/ld: > > >> > > > >> > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > > >> > undefined reference to `PetscMatlabEngineDestroy' > > >> > > > >> > collect2: error: ld returned 1 exit status > > >> > make: *** [gmakefile.test:185: > > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] > > >> Error 1 > > >> > make: *** Waiting for unfinished jobs.... > > >> > > > >> > Please help me to solve this issue > > >> > > > >> > Thank you > > >> > Srinivas > > >> > > > >> > > > >> > > >> > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 29 12:10:42 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 29 Aug 2023 13:10:42 -0400 Subject: [petsc-users] Orthogonalization of a (sparse) PETSc matrix In-Reply-To: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> References: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> Message-ID: Are the nonzero structures of all the rows related? If they are, one could devise a routine to take advantage of this relationship, but if the nonzero structures of each row are "randomly" different from all the other rows, then it is difficult to see how one can take advantage of the sparsity. > On Aug 29, 2023, at 12:50 PM, Thanasis Boutsikakis wrote: > > Hi all, I have the following code that orthogonalizes a PETSc matrix. The problem is that this implementation requires that the PETSc matrix is dense, otherwise, it fails at bv.SetFromOptions(). Hence the assert in orthogonality(). > > What could I do in order to be able to orthogonalize sparse matrices as well? Could I convert it efficiently? (I tried to no avail) > > Thanks! > > """Experimenting with matrix orthogonalization""" > > import contextlib > import sys > import time > import numpy as np > from firedrake import COMM_WORLD > from firedrake.petsc import PETSc > > import slepc4py > > slepc4py.init(sys.argv) > from slepc4py import SLEPc > > from numpy.testing import assert_array_almost_equal > > EPSILON_USER = 1e-4 > EPS = sys.float_info.epsilon > > > def Print(message: str): > """Print function that prints only on rank 0 with color > > Args: > message (str): message to be printed > """ > PETSc.Sys.Print(message) > > > def create_petsc_matrix(input_array, sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > def orthogonality(A): # sourcery skip: avoid-builtin-shadow > """Checking and correcting orthogonality > > Args: > A (PETSc.Mat): Matrix of size [m x k]. > > Returns: > PETSc.Mat: Matrix of size [m x k]. > """ > # Check if the matrix is dense > mat_type = A.getType() > assert mat_type in ( > "seqdense", > "mpidense", > ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a dense matrix." > > m, k = A.getSize() > > Phi1 = A.getColumnVector(0) > Phi2 = A.getColumnVector(k - 1) > > # Compute dot product using PETSc function > dot_product = Phi1.dot(Phi2) > > if abs(dot_product) > min(EPSILON_USER, EPS * m): > Print(" Matrix is not orthogonal") > > # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL > _type = SLEPc.BV().OrthogBlockType.GS > > bv = SLEPc.BV().createFromMat(A) > bv.setFromOptions() > bv.setOrthogonalization(_type) > bv.orthogonalize() > > A = bv.createMat() > > Print(" Matrix successfully orthogonalized") > > # # Assembly the matrix to compute the final structure > if not A.assembled: > A.assemblyBegin() > A.assemblyEnd() > else: > Print(" Matrix is orthogonal") > > return A > > > # -------------------------------------------- > # EXP: Orthogonalization of an mpi PETSc matrix > # -------------------------------------------- > > m, k = 11, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, k)) > > A = create_petsc_matrix(A_np, sparse=False) > > A_orthogonal = orthogonality(A) > > # -------------------------------------------- > # TEST: Orthogonalization of a numpy matrix > # -------------------------------------------- > # Generate A_np_orthogonal > A_np_orthogonal, _ = np.linalg.qr(A_np) > > # Get the local values from A_orthogonal > local_rows_start, local_rows_end = A_orthogonal.getOwnershipRange() > A_orthogonal_local = A_orthogonal.getValues( > range(local_rows_start, local_rows_end), range(k) > ) > > # Assert the correctness of the multiplication for the local subset > assert_array_almost_equal( > np.abs(A_orthogonal_local), > np.abs(A_np_orthogonal[local_rows_start:local_rows_end, :]), > decimal=5, > ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Tue Aug 29 12:13:05 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Tue, 29 Aug 2023 19:13:05 +0200 Subject: [petsc-users] Orthogonalization of a (sparse) PETSc matrix In-Reply-To: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> References: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> Message-ID: The result of bv.orthogonalize() is most probably a dense matrix, and the result replaces the input matrix, that's why the input matrix is required to be dense. You can simply do this: bv = SLEPc.BV().createFromMat(A.convert('dense')) Jose > El 29 ago 2023, a las 18:50, Thanasis Boutsikakis escribi?: > > Hi all, I have the following code that orthogonalizes a PETSc matrix. The problem is that this implementation requires that the PETSc matrix is dense, otherwise, it fails at bv.SetFromOptions(). Hence the assert in orthogonality(). > > What could I do in order to be able to orthogonalize sparse matrices as well? Could I convert it efficiently? (I tried to no avail) > > Thanks! > > """Experimenting with matrix orthogonalization""" > > import contextlib > import sys > import time > import numpy as np > from firedrake import COMM_WORLD > from firedrake.petsc import PETSc > > import slepc4py > > slepc4py.init(sys.argv) > from slepc4py import SLEPc > > from numpy.testing import assert_array_almost_equal > > EPSILON_USER = 1e-4 > EPS = sys.float_info.epsilon > > > def Print(message: str): > """Print function that prints only on rank 0 with color > > Args: > message (str): message to be printed > """ > PETSc.Sys.Print(message) > > > def create_petsc_matrix(input_array, sparse=True): > """Create a PETSc matrix from an input_array > > Args: > input_array (np array): Input array > partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. > sparse (bool, optional): Toggle for sparese or dense. Defaults to True. > > Returns: > PETSc mat: PETSc matrix > """ > # Check if input_array is 1D and reshape if necessary > assert len(input_array.shape) == 2, "Input array should be 2-dimensional" > global_rows, global_cols = input_array.shape > > size = ((None, global_rows), (global_cols, global_cols)) > > # Create a sparse or dense matrix based on the 'sparse' argument > if sparse: > matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) > else: > matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) > matrix.setUp() > > local_rows_start, local_rows_end = matrix.getOwnershipRange() > > for counter, i in enumerate(range(local_rows_start, local_rows_end)): > # Calculate the correct row in the array for the current process > row_in_array = counter + local_rows_start > matrix.setValues( > i, range(global_cols), input_array[row_in_array, :], addv=False > ) > > # Assembly the matrix to compute the final structure > matrix.assemblyBegin() > matrix.assemblyEnd() > > return matrix > > > def orthogonality(A): # sourcery skip: avoid-builtin-shadow > """Checking and correcting orthogonality > > Args: > A (PETSc.Mat): Matrix of size [m x k]. > > Returns: > PETSc.Mat: Matrix of size [m x k]. > """ > # Check if the matrix is dense > mat_type = A.getType() > assert mat_type in ( > "seqdense", > "mpidense", > ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a dense matrix." > > m, k = A.getSize() > > Phi1 = A.getColumnVector(0) > Phi2 = A.getColumnVector(k - 1) > > # Compute dot product using PETSc function > dot_product = Phi1.dot(Phi2) > > if abs(dot_product) > min(EPSILON_USER, EPS * m): > Print(" Matrix is not orthogonal") > > # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL > _type = SLEPc.BV().OrthogBlockType.GS > > bv = SLEPc.BV().createFromMat(A) > bv.setFromOptions() > bv.setOrthogonalization(_type) > bv.orthogonalize() > > A = bv.createMat() > > Print(" Matrix successfully orthogonalized") > > # # Assembly the matrix to compute the final structure > if not A.assembled: > A.assemblyBegin() > A.assemblyEnd() > else: > Print(" Matrix is orthogonal") > > return A > > > # -------------------------------------------- > # EXP: Orthogonalization of an mpi PETSc matrix > # -------------------------------------------- > > m, k = 11, 7 > # Generate the random numpy matrices > np.random.seed(0) # sets the seed to 0 > A_np = np.random.randint(low=0, high=6, size=(m, k)) > > A = create_petsc_matrix(A_np, sparse=False) > > A_orthogonal = orthogonality(A) > > # -------------------------------------------- > # TEST: Orthogonalization of a numpy matrix > # -------------------------------------------- > # Generate A_np_orthogonal > A_np_orthogonal, _ = np.linalg.qr(A_np) > > # Get the local values from A_orthogonal > local_rows_start, local_rows_end = A_orthogonal.getOwnershipRange() > A_orthogonal_local = A_orthogonal.getValues( > range(local_rows_start, local_rows_end), range(k) > ) > > # Assert the correctness of the multiplication for the local subset > assert_array_almost_equal( > np.abs(A_orthogonal_local), > np.abs(A_np_orthogonal[local_rows_start:local_rows_end, :]), > decimal=5, > ) From jed at jedbrown.org Tue Aug 29 12:17:23 2023 From: jed at jedbrown.org (Jed Brown) Date: Tue, 29 Aug 2023 11:17:23 -0600 Subject: [petsc-users] Orthogonalization of a (sparse) PETSc matrix In-Reply-To: References: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> Message-ID: <87o7ipix98.fsf@jedbrown.org> Suitesparse includes a sparse QR algorithm. The main issue is that (even with pivoting) the R factor has the same nonzero structure as a Cholesky factor of A^T A, which is generally much denser than a factor of A, and this degraded sparsity impacts Q as well. I wonder if someone would like to contribute a sparse QR to PETSc. It could have a default implementation via Cholesky QR and the ability to call SPQR from Suitesparse. Barry Smith writes: > Are the nonzero structures of all the rows related? If they are, one could devise a routine to take advantage of this relationship, but if the nonzero structures of each row are "randomly" different from all the other rows, then it is difficult to see how one can take advantage of the sparsity. > > > >> On Aug 29, 2023, at 12:50 PM, Thanasis Boutsikakis wrote: >> >> Hi all, I have the following code that orthogonalizes a PETSc matrix. The problem is that this implementation requires that the PETSc matrix is dense, otherwise, it fails at bv.SetFromOptions(). Hence the assert in orthogonality(). >> >> What could I do in order to be able to orthogonalize sparse matrices as well? Could I convert it efficiently? (I tried to no avail) >> >> Thanks! >> >> """Experimenting with matrix orthogonalization""" >> >> import contextlib >> import sys >> import time >> import numpy as np >> from firedrake import COMM_WORLD >> from firedrake.petsc import PETSc >> >> import slepc4py >> >> slepc4py.init(sys.argv) >> from slepc4py import SLEPc >> >> from numpy.testing import assert_array_almost_equal >> >> EPSILON_USER = 1e-4 >> EPS = sys.float_info.epsilon >> >> >> def Print(message: str): >> """Print function that prints only on rank 0 with color >> >> Args: >> message (str): message to be printed >> """ >> PETSc.Sys.Print(message) >> >> >> def create_petsc_matrix(input_array, sparse=True): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> PETSc mat: PETSc matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> >> size = ((None, global_rows), (global_cols, global_cols)) >> >> # Create a sparse or dense matrix based on the 'sparse' argument >> if sparse: >> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >> else: >> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> >> def orthogonality(A): # sourcery skip: avoid-builtin-shadow >> """Checking and correcting orthogonality >> >> Args: >> A (PETSc.Mat): Matrix of size [m x k]. >> >> Returns: >> PETSc.Mat: Matrix of size [m x k]. >> """ >> # Check if the matrix is dense >> mat_type = A.getType() >> assert mat_type in ( >> "seqdense", >> "mpidense", >> ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a dense matrix." >> >> m, k = A.getSize() >> >> Phi1 = A.getColumnVector(0) >> Phi2 = A.getColumnVector(k - 1) >> >> # Compute dot product using PETSc function >> dot_product = Phi1.dot(Phi2) >> >> if abs(dot_product) > min(EPSILON_USER, EPS * m): >> Print(" Matrix is not orthogonal") >> >> # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL >> _type = SLEPc.BV().OrthogBlockType.GS >> >> bv = SLEPc.BV().createFromMat(A) >> bv.setFromOptions() >> bv.setOrthogonalization(_type) >> bv.orthogonalize() >> >> A = bv.createMat() >> >> Print(" Matrix successfully orthogonalized") >> >> # # Assembly the matrix to compute the final structure >> if not A.assembled: >> A.assemblyBegin() >> A.assemblyEnd() >> else: >> Print(" Matrix is orthogonal") >> >> return A >> >> >> # -------------------------------------------- >> # EXP: Orthogonalization of an mpi PETSc matrix >> # -------------------------------------------- >> >> m, k = 11, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> A = create_petsc_matrix(A_np, sparse=False) >> >> A_orthogonal = orthogonality(A) >> >> # -------------------------------------------- >> # TEST: Orthogonalization of a numpy matrix >> # -------------------------------------------- >> # Generate A_np_orthogonal >> A_np_orthogonal, _ = np.linalg.qr(A_np) >> >> # Get the local values from A_orthogonal >> local_rows_start, local_rows_end = A_orthogonal.getOwnershipRange() >> A_orthogonal_local = A_orthogonal.getValues( >> range(local_rows_start, local_rows_end), range(k) >> ) >> >> # Assert the correctness of the multiplication for the local subset >> assert_array_almost_equal( >> np.abs(A_orthogonal_local), >> np.abs(A_np_orthogonal[local_rows_start:local_rows_end, :]), >> decimal=5, >> ) From balay at mcs.anl.gov Tue Aug 29 12:26:29 2023 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 29 Aug 2023 12:26:29 -0500 (CDT) Subject: [petsc-users] Error while building PETSc with MATLAB In-Reply-To: References: <07d75222-fe7f-3025-2f1e-1b8059f7bb7d@mcs.anl.gov> <47c457d8-6b74-dee2-f061-9962e50b133f@mcs.anl.gov> Message-ID: <361a87cb-a5c7-3145-06d4-e6dd81c10277@mcs.anl.gov> Well - you sent in libmesh log not petsc's configure.log/make.log for petsc-3.17 Anyway - with petsc-3.13 - you have: >>>> Matlab: Includes: -I/usr/local/MATLAB/R2020b/extern/include /usr/local/MATLAB/R2020b MatlabEngine: Library: -Wl,-rpath,/usr/local/MATLAB/R2020b/sys/os/glnxa64:/usr/local/MATLAB/R2020b/bin/glnxa64:/usr/local/MATLAB/R2020b/extern/lib/glnxa64 -L/usr/local/MATLAB/R2020b/bin/glnxa64 -L/usr/local/MATLAB/R2020b/extern/lib/glnxa64 -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc Language used to compile PETSc: C <<<<< With petsc-3.19 (and matlab-R2022a) - we are seeing: https://gitlab.com/petsc/petsc/-/jobs/4904566768 >>> Matlab: Includes: -I/nfs/gce/software/custom/linux-ubuntu22.04-x86_64/matlab/R2022a/extern/include Libraries: -Wl,-rpath,/nfs/gce/software/custom/linux-ubuntu22.04-x86_64/matlab/R2022a/bin/glnxa64 -L/nfs/gce/software/custom/linux-ubuntu22.04-x86_64/matlab/R2022a/bin/glnxa64 -leng -lmex -lmx -lmat Executable: /nfs/gce/software/custom/linux-ubuntu22.04-x86_64/matlab/R2022a mex: /nfs/gce/software/custom/linux-ubuntu22.04-x86_64/matlab/R2022a/bin/mex matlab: /nfs/gce/software/custom/linux-ubuntu22.04-x86_64/matlab/R2022a/bin/matlab -glnxa64 <<< I.e "-lut -licudata -licui18n -licuuc" are not preset here. This might be a change wrt newer matlab versions. You can: - edit /home/vit/sfw/petsc/3.13.4/linux-opt/lib/petsc/conf/petscvariables and remove all occurrences of "-lut -licudata -licui18n -licuuc" - now run 'make all' in '/home/vit/sfw/petsc/3.13.4' And see if the build works now. Satish On Tue, 29 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > I am sharing the log files while building petsc3.13.4 with matlab and also > the log file while building libmesh with petsc3.17.5 and matlab. Building > petsc 3.17.5 with matlab was done successfully. But libmesh is not able to > find the petsc > Please find the attachments. > > On Tue, Aug 29, 2023 at 7:31?PM Satish Balay wrote: > > > Send configure.log, make.log from both petsc-3.13 and 3.17 [or 3.19]. > > > > [you can gzip them to make the logs friendly to mailing list - or send > > them to petsc-maint] > > > > And does test suite work with 3.17? [or 3.19?] > > > > Satish > > > > On Tue, 29 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > > > I am sharing the make.log file while building petsc-3.13.4 with Matlab. > > > Please find the attachment and do the needful. > > > > > > On Tue, Aug 29, 2023 at 10:19?AM INTURU SRINIVAS 20PHD0548 < > > > inturu.srinivas2020 at vitstudent.ac.in> wrote: > > > > > > > I tried with petsc-3.17.5. During building of libmesh, the error shows > > > > petsc was not found > > > > > > > > On Mon, Aug 28, 2023 at 9:43?PM Satish Balay > > wrote: > > > > > > > >> https://ibamr.github.io/linux says petsc-3.17 > > > >> > > > >> Here you are using 3.13 > > > >> > > > >> Can you retry with petsc-3.17.5? > > > >> > > > >> Satish > > > >> > > > >> On Mon, 28 Aug 2023, INTURU SRINIVAS 20PHD0548 via petsc-users wrote: > > > >> > > > >> > Hello, > > > >> > > > > >> > I want to build PETSc with MATLAB for working on the simulation > > using > > > >> IBAMR > > > >> > open software. While building the PETSc, using the following > > > >> > > > > >> > export PETSC_DIR=$PWD > > > >> > export PETSC_ARCH=linux-debug > > > >> > ./configure \ > > > >> > --CC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicc \ > > > >> > --CXX=$HOME/sfw/linux/openmpi/4.1.4/bin/mpicxx \ > > > >> > --FC=$HOME/sfw/linux/openmpi/4.1.4/bin/mpif90 \ > > > >> > --with-debugging=1 \ > > > >> > --download-hypre=1 \ > > > >> > --download-fblaslapack=1 \ > > > >> > --with-x=0 \ > > > >> > --with-matlab-dir=/usr/local/MATLAB/R2020b/ > > > >> > --with-matlab-engine=1 > > > >> > --with-matlab-engine-dir=/usr/local/MATLAB/R2020b/extern/engines/ > > > >> > > > > >> > make -j4 > > > >> > make -j4 test > > > >> > > > > >> > I got the following error > > > >> > CLINKER > > > >> linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test > > > >> > /usr/bin/ld: > > > >> > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > > >> > function `EvaluateResidual': > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:32: > > > >> > undefined reference to `PetscMatlabEnginePut' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:33: > > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:35: > > > >> > undefined reference to `PetscMatlabEngineGet' > > > >> > /usr/bin/ld: > > > >> > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > > >> > function `EvaluateJacobian': > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:46: > > > >> > undefined reference to `PetscMatlabEnginePut' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:47: > > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:49: > > > >> > undefined reference to `PetscMatlabEngineGet' > > > >> > /usr/bin/ld: > > > >> > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > > >> > function `TaoPounders': > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:75: > > > >> > undefined reference to `PetscMatlabEngineGet' > > > >> > /usr/bin/ld: > > > >> > > > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test.o: in > > > >> > function `main': > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:126: > > > >> > undefined reference to `PetscMatlabEngineCreate' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:127: > > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:139: > > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:140: > > > >> > undefined reference to `PetscMatlabEngineGetArray' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:142: > > > >> > undefined reference to `PetscMatlabEngineGetArray' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:144: > > > >> > undefined reference to `PetscMatlabEngineGetArray' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:146: > > > >> > undefined reference to `PetscMatlabEngineGetArray' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:148: > > > >> > undefined reference to `PetscMatlabEngineGetArray' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:154: > > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:157: > > > >> > undefined reference to `PetscMatlabEngineEvaluate' > > > >> > /usr/bin/ld: > > > >> > > > > >> > > /home/vit/sfw/petsc/3.13.4/src/tao/leastsquares/tutorials/matlab/matlab_ls_test.c:158: > > > >> > undefined reference to `PetscMatlabEngineDestroy' > > > >> > > > > >> > collect2: error: ld returned 1 exit status > > > >> > make: *** [gmakefile.test:185: > > > >> > linux-debug/tests/tao/leastsquares/tutorials/matlab/matlab_ls_test] > > > >> Error 1 > > > >> > make: *** Waiting for unfinished jobs.... > > > >> > > > > >> > Please help me to solve this issue > > > >> > > > > >> > Thank you > > > >> > Srinivas > > > >> > > > > >> > > > > >> > > > >> > > > > > > > > > > From bsmith at petsc.dev Tue Aug 29 12:29:35 2023 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 29 Aug 2023 13:29:35 -0400 Subject: [petsc-users] Orthogonalization of a (sparse) PETSc matrix In-Reply-To: <87o7ipix98.fsf@jedbrown.org> References: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> <87o7ipix98.fsf@jedbrown.org> Message-ID: <3D7EF56E-7DCF-48ED-A734-D9F0E4661430@petsc.dev> Ah, there is https://petsc.org/release/manualpages/Mat/MATSOLVERSPQR/#matsolverspqr See also https://petsc.org/release/manualpages/Mat/MatGetFactor/#matgetfactor and https://petsc.org/release/manualpages/Mat/MatQRFactorSymbolic/ > On Aug 29, 2023, at 1:17 PM, Jed Brown wrote: > > Suitesparse includes a sparse QR algorithm. The main issue is that (even with pivoting) the R factor has the same nonzero structure as a Cholesky factor of A^T A, which is generally much denser than a factor of A, and this degraded sparsity impacts Q as well. > > I wonder if someone would like to contribute a sparse QR to PETSc. It could have a default implementation via Cholesky QR and the ability to call SPQR from Suitesparse. > > Barry Smith writes: > >> Are the nonzero structures of all the rows related? If they are, one could devise a routine to take advantage of this relationship, but if the nonzero structures of each row are "randomly" different from all the other rows, then it is difficult to see how one can take advantage of the sparsity. >> >> >> >>> On Aug 29, 2023, at 12:50 PM, Thanasis Boutsikakis wrote: >>> >>> Hi all, I have the following code that orthogonalizes a PETSc matrix. The problem is that this implementation requires that the PETSc matrix is dense, otherwise, it fails at bv.SetFromOptions(). Hence the assert in orthogonality(). >>> >>> What could I do in order to be able to orthogonalize sparse matrices as well? Could I convert it efficiently? (I tried to no avail) >>> >>> Thanks! >>> >>> """Experimenting with matrix orthogonalization""" >>> >>> import contextlib >>> import sys >>> import time >>> import numpy as np >>> from firedrake import COMM_WORLD >>> from firedrake.petsc import PETSc >>> >>> import slepc4py >>> >>> slepc4py.init(sys.argv) >>> from slepc4py import SLEPc >>> >>> from numpy.testing import assert_array_almost_equal >>> >>> EPSILON_USER = 1e-4 >>> EPS = sys.float_info.epsilon >>> >>> >>> def Print(message: str): >>> """Print function that prints only on rank 0 with color >>> >>> Args: >>> message (str): message to be printed >>> """ >>> PETSc.Sys.Print(message) >>> >>> >>> def create_petsc_matrix(input_array, sparse=True): >>> """Create a PETSc matrix from an input_array >>> >>> Args: >>> input_array (np array): Input array >>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>> >>> Returns: >>> PETSc mat: PETSc matrix >>> """ >>> # Check if input_array is 1D and reshape if necessary >>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>> global_rows, global_cols = input_array.shape >>> >>> size = ((None, global_rows), (global_cols, global_cols)) >>> >>> # Create a sparse or dense matrix based on the 'sparse' argument >>> if sparse: >>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>> else: >>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>> matrix.setUp() >>> >>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>> >>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>> # Calculate the correct row in the array for the current process >>> row_in_array = counter + local_rows_start >>> matrix.setValues( >>> i, range(global_cols), input_array[row_in_array, :], addv=False >>> ) >>> >>> # Assembly the matrix to compute the final structure >>> matrix.assemblyBegin() >>> matrix.assemblyEnd() >>> >>> return matrix >>> >>> >>> def orthogonality(A): # sourcery skip: avoid-builtin-shadow >>> """Checking and correcting orthogonality >>> >>> Args: >>> A (PETSc.Mat): Matrix of size [m x k]. >>> >>> Returns: >>> PETSc.Mat: Matrix of size [m x k]. >>> """ >>> # Check if the matrix is dense >>> mat_type = A.getType() >>> assert mat_type in ( >>> "seqdense", >>> "mpidense", >>> ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a dense matrix." >>> >>> m, k = A.getSize() >>> >>> Phi1 = A.getColumnVector(0) >>> Phi2 = A.getColumnVector(k - 1) >>> >>> # Compute dot product using PETSc function >>> dot_product = Phi1.dot(Phi2) >>> >>> if abs(dot_product) > min(EPSILON_USER, EPS * m): >>> Print(" Matrix is not orthogonal") >>> >>> # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL >>> _type = SLEPc.BV().OrthogBlockType.GS >>> >>> bv = SLEPc.BV().createFromMat(A) >>> bv.setFromOptions() >>> bv.setOrthogonalization(_type) >>> bv.orthogonalize() >>> >>> A = bv.createMat() >>> >>> Print(" Matrix successfully orthogonalized") >>> >>> # # Assembly the matrix to compute the final structure >>> if not A.assembled: >>> A.assemblyBegin() >>> A.assemblyEnd() >>> else: >>> Print(" Matrix is orthogonal") >>> >>> return A >>> >>> >>> # -------------------------------------------- >>> # EXP: Orthogonalization of an mpi PETSc matrix >>> # -------------------------------------------- >>> >>> m, k = 11, 7 >>> # Generate the random numpy matrices >>> np.random.seed(0) # sets the seed to 0 >>> A_np = np.random.randint(low=0, high=6, size=(m, k)) >>> >>> A = create_petsc_matrix(A_np, sparse=False) >>> >>> A_orthogonal = orthogonality(A) >>> >>> # -------------------------------------------- >>> # TEST: Orthogonalization of a numpy matrix >>> # -------------------------------------------- >>> # Generate A_np_orthogonal >>> A_np_orthogonal, _ = np.linalg.qr(A_np) >>> >>> # Get the local values from A_orthogonal >>> local_rows_start, local_rows_end = A_orthogonal.getOwnershipRange() >>> A_orthogonal_local = A_orthogonal.getValues( >>> range(local_rows_start, local_rows_end), range(k) >>> ) >>> >>> # Assert the correctness of the multiplication for the local subset >>> assert_array_almost_equal( >>> np.abs(A_orthogonal_local), >>> np.abs(A_np_orthogonal[local_rows_start:local_rows_end, :]), >>> decimal=5, >>> ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From thanasis.boutsikakis at corintis.com Tue Aug 29 15:46:01 2023 From: thanasis.boutsikakis at corintis.com (Thanasis Boutsikakis) Date: Tue, 29 Aug 2023 22:46:01 +0200 Subject: [petsc-users] Orthogonalization of a (sparse) PETSc matrix In-Reply-To: References: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> Message-ID: Thanks Jose, This works indeed. However, I was under the impression that this conversion might be very costly for big matrices with low sparsity and it would scale with the number of non-zero values. Do you have any idea of the efficiency of this operation? Thanks > On 29 Aug 2023, at 19:13, Jose E. Roman wrote: > > The result of bv.orthogonalize() is most probably a dense matrix, and the result replaces the input matrix, that's why the input matrix is required to be dense. > > You can simply do this: > > bv = SLEPc.BV().createFromMat(A.convert('dense')) > > Jose > >> El 29 ago 2023, a las 18:50, Thanasis Boutsikakis escribi?: >> >> Hi all, I have the following code that orthogonalizes a PETSc matrix. The problem is that this implementation requires that the PETSc matrix is dense, otherwise, it fails at bv.SetFromOptions(). Hence the assert in orthogonality(). >> >> What could I do in order to be able to orthogonalize sparse matrices as well? Could I convert it efficiently? (I tried to no avail) >> >> Thanks! >> >> """Experimenting with matrix orthogonalization""" >> >> import contextlib >> import sys >> import time >> import numpy as np >> from firedrake import COMM_WORLD >> from firedrake.petsc import PETSc >> >> import slepc4py >> >> slepc4py.init(sys.argv) >> from slepc4py import SLEPc >> >> from numpy.testing import assert_array_almost_equal >> >> EPSILON_USER = 1e-4 >> EPS = sys.float_info.epsilon >> >> >> def Print(message: str): >> """Print function that prints only on rank 0 with color >> >> Args: >> message (str): message to be printed >> """ >> PETSc.Sys.Print(message) >> >> >> def create_petsc_matrix(input_array, sparse=True): >> """Create a PETSc matrix from an input_array >> >> Args: >> input_array (np array): Input array >> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >> >> Returns: >> PETSc mat: PETSc matrix >> """ >> # Check if input_array is 1D and reshape if necessary >> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >> global_rows, global_cols = input_array.shape >> >> size = ((None, global_rows), (global_cols, global_cols)) >> >> # Create a sparse or dense matrix based on the 'sparse' argument >> if sparse: >> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >> else: >> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >> matrix.setUp() >> >> local_rows_start, local_rows_end = matrix.getOwnershipRange() >> >> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >> # Calculate the correct row in the array for the current process >> row_in_array = counter + local_rows_start >> matrix.setValues( >> i, range(global_cols), input_array[row_in_array, :], addv=False >> ) >> >> # Assembly the matrix to compute the final structure >> matrix.assemblyBegin() >> matrix.assemblyEnd() >> >> return matrix >> >> >> def orthogonality(A): # sourcery skip: avoid-builtin-shadow >> """Checking and correcting orthogonality >> >> Args: >> A (PETSc.Mat): Matrix of size [m x k]. >> >> Returns: >> PETSc.Mat: Matrix of size [m x k]. >> """ >> # Check if the matrix is dense >> mat_type = A.getType() >> assert mat_type in ( >> "seqdense", >> "mpidense", >> ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a dense matrix." >> >> m, k = A.getSize() >> >> Phi1 = A.getColumnVector(0) >> Phi2 = A.getColumnVector(k - 1) >> >> # Compute dot product using PETSc function >> dot_product = Phi1.dot(Phi2) >> >> if abs(dot_product) > min(EPSILON_USER, EPS * m): >> Print(" Matrix is not orthogonal") >> >> # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL >> _type = SLEPc.BV().OrthogBlockType.GS >> >> bv = SLEPc.BV().createFromMat(A) >> bv.setFromOptions() >> bv.setOrthogonalization(_type) >> bv.orthogonalize() >> >> A = bv.createMat() >> >> Print(" Matrix successfully orthogonalized") >> >> # # Assembly the matrix to compute the final structure >> if not A.assembled: >> A.assemblyBegin() >> A.assemblyEnd() >> else: >> Print(" Matrix is orthogonal") >> >> return A >> >> >> # -------------------------------------------- >> # EXP: Orthogonalization of an mpi PETSc matrix >> # -------------------------------------------- >> >> m, k = 11, 7 >> # Generate the random numpy matrices >> np.random.seed(0) # sets the seed to 0 >> A_np = np.random.randint(low=0, high=6, size=(m, k)) >> >> A = create_petsc_matrix(A_np, sparse=False) >> >> A_orthogonal = orthogonality(A) >> >> # -------------------------------------------- >> # TEST: Orthogonalization of a numpy matrix >> # -------------------------------------------- >> # Generate A_np_orthogonal >> A_np_orthogonal, _ = np.linalg.qr(A_np) >> >> # Get the local values from A_orthogonal >> local_rows_start, local_rows_end = A_orthogonal.getOwnershipRange() >> A_orthogonal_local = A_orthogonal.getValues( >> range(local_rows_start, local_rows_end), range(k) >> ) >> >> # Assert the correctness of the multiplication for the local subset >> assert_array_almost_equal( >> np.abs(A_orthogonal_local), >> np.abs(A_np_orthogonal[local_rows_start:local_rows_end, :]), >> decimal=5, >> ) > From jroman at dsic.upv.es Wed Aug 30 02:17:12 2023 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 30 Aug 2023 09:17:12 +0200 Subject: [petsc-users] Orthogonalization of a (sparse) PETSc matrix In-Reply-To: References: <814F2E55-A62D-4799-B118-7DC706F1A65A@corintis.com> Message-ID: <987DA5F8-DC3F-4199-828E-17D239DEC442@dsic.upv.es> The conversion from MATAIJ to MATDENSE should be very cheap, see https://gitlab.com/petsc/petsc/-/blob/main/src/mat/impls/dense/seq/dense.c?ref_type=heads#L172 The matrix copy hidden inside createFromMat() is likely more expensive. I am currently working on a modification of BV that will be included in version 3.20 if everything goes well - then I think I can allow passing a sparse matrix to createFromMat() and do the conversion internally, avoiding the matrix copy. Jose > El 29 ago 2023, a las 22:46, Thanasis Boutsikakis escribi?: > > Thanks Jose, > > This works indeed. However, I was under the impression that this conversion might be very costly for big matrices with low sparsity and it would scale with the number of non-zero values. > > Do you have any idea of the efficiency of this operation? > > Thanks > >> On 29 Aug 2023, at 19:13, Jose E. Roman wrote: >> >> The result of bv.orthogonalize() is most probably a dense matrix, and the result replaces the input matrix, that's why the input matrix is required to be dense. >> >> You can simply do this: >> >> bv = SLEPc.BV().createFromMat(A.convert('dense')) >> >> Jose >> >>> El 29 ago 2023, a las 18:50, Thanasis Boutsikakis escribi?: >>> >>> Hi all, I have the following code that orthogonalizes a PETSc matrix. The problem is that this implementation requires that the PETSc matrix is dense, otherwise, it fails at bv.SetFromOptions(). Hence the assert in orthogonality(). >>> >>> What could I do in order to be able to orthogonalize sparse matrices as well? Could I convert it efficiently? (I tried to no avail) >>> >>> Thanks! >>> >>> """Experimenting with matrix orthogonalization""" >>> >>> import contextlib >>> import sys >>> import time >>> import numpy as np >>> from firedrake import COMM_WORLD >>> from firedrake.petsc import PETSc >>> >>> import slepc4py >>> >>> slepc4py.init(sys.argv) >>> from slepc4py import SLEPc >>> >>> from numpy.testing import assert_array_almost_equal >>> >>> EPSILON_USER = 1e-4 >>> EPS = sys.float_info.epsilon >>> >>> >>> def Print(message: str): >>> """Print function that prints only on rank 0 with color >>> >>> Args: >>> message (str): message to be printed >>> """ >>> PETSc.Sys.Print(message) >>> >>> >>> def create_petsc_matrix(input_array, sparse=True): >>> """Create a PETSc matrix from an input_array >>> >>> Args: >>> input_array (np array): Input array >>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None. >>> sparse (bool, optional): Toggle for sparese or dense. Defaults to True. >>> >>> Returns: >>> PETSc mat: PETSc matrix >>> """ >>> # Check if input_array is 1D and reshape if necessary >>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional" >>> global_rows, global_cols = input_array.shape >>> >>> size = ((None, global_rows), (global_cols, global_cols)) >>> >>> # Create a sparse or dense matrix based on the 'sparse' argument >>> if sparse: >>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD) >>> else: >>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD) >>> matrix.setUp() >>> >>> local_rows_start, local_rows_end = matrix.getOwnershipRange() >>> >>> for counter, i in enumerate(range(local_rows_start, local_rows_end)): >>> # Calculate the correct row in the array for the current process >>> row_in_array = counter + local_rows_start >>> matrix.setValues( >>> i, range(global_cols), input_array[row_in_array, :], addv=False >>> ) >>> >>> # Assembly the matrix to compute the final structure >>> matrix.assemblyBegin() >>> matrix.assemblyEnd() >>> >>> return matrix >>> >>> >>> def orthogonality(A): # sourcery skip: avoid-builtin-shadow >>> """Checking and correcting orthogonality >>> >>> Args: >>> A (PETSc.Mat): Matrix of size [m x k]. >>> >>> Returns: >>> PETSc.Mat: Matrix of size [m x k]. >>> """ >>> # Check if the matrix is dense >>> mat_type = A.getType() >>> assert mat_type in ( >>> "seqdense", >>> "mpidense", >>> ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a dense matrix." >>> >>> m, k = A.getSize() >>> >>> Phi1 = A.getColumnVector(0) >>> Phi2 = A.getColumnVector(k - 1) >>> >>> # Compute dot product using PETSc function >>> dot_product = Phi1.dot(Phi2) >>> >>> if abs(dot_product) > min(EPSILON_USER, EPS * m): >>> Print(" Matrix is not orthogonal") >>> >>> # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL >>> _type = SLEPc.BV().OrthogBlockType.GS >>> >>> bv = SLEPc.BV().createFromMat(A) >>> bv.setFromOptions() >>> bv.setOrthogonalization(_type) >>> bv.orthogonalize() >>> >>> A = bv.createMat() >>> >>> Print(" Matrix successfully orthogonalized") >>> >>> # # Assembly the matrix to compute the final structure >>> if not A.assembled: >>> A.assemblyBegin() >>> A.assemblyEnd() >>> else: >>> Print(" Matrix is orthogonal") >>> >>> return A >>> >>> >>> # -------------------------------------------- >>> # EXP: Orthogonalization of an mpi PETSc matrix >>> # -------------------------------------------- >>> >>> m, k = 11, 7 >>> # Generate the random numpy matrices >>> np.random.seed(0) # sets the seed to 0 >>> A_np = np.random.randint(low=0, high=6, size=(m, k)) >>> >>> A = create_petsc_matrix(A_np, sparse=False) >>> >>> A_orthogonal = orthogonality(A) >>> >>> # -------------------------------------------- >>> # TEST: Orthogonalization of a numpy matrix >>> # -------------------------------------------- >>> # Generate A_np_orthogonal >>> A_np_orthogonal, _ = np.linalg.qr(A_np) >>> >>> # Get the local values from A_orthogonal >>> local_rows_start, local_rows_end = A_orthogonal.getOwnershipRange() >>> A_orthogonal_local = A_orthogonal.getValues( >>> range(local_rows_start, local_rows_end), range(k) >>> ) >>> >>> # Assert the correctness of the multiplication for the local subset >>> assert_array_almost_equal( >>> np.abs(A_orthogonal_local), >>> np.abs(A_np_orthogonal[local_rows_start:local_rows_end, :]), >>> decimal=5, >>> ) >> > From ramoni.zsedano at gmail.com Wed Aug 30 15:41:28 2023 From: ramoni.zsedano at gmail.com (Ramoni Z. Sedano Azevedo) Date: Wed, 30 Aug 2023 17:41:28 -0300 Subject: [petsc-users] Error using GPU in Fortran code Message-ID: Hello, I'm executing a code in Fortran using PETSc with MPI via CPU and I would like to execute it using GPU. PETSc is configured as follows: ./configure \ --prefix=${PWD}/installdir \ --with-fortran \ --with-fortran-kernels=true \ --with-cuda \ --download-fblaslapack \ --with-scalar-type=complex \ --with-precision=double \ --with-debugging=0 \ --with-x=0 \ --with-gnu-compilers=1 \ --with-cc=mpicc \ --with-cxx=mpicxx \ --with-fc=mpif90 \ --with-make-exec=make The parameters for using MPI on CPU are: mpirun -np $ntasks ./${executable} \ -A_mat_type mpiaij \ -P_mat_type mpiaij \ -em_ksp_monitor_true_residual \ -em_ksp_type bcgs \ -em_pc_type bjacobi \ -em_sub_pc_type ilu \ -em_sub_pc_factor_levels 3 \ -em_sub_pc_factor_fill 6 \ < ./Parameters.inp Code output: Solving for Hz fields bnorm 3.7727507818834821E-005 xnorm 2.3407405211699372E-016 Residual norms for em_ solve. 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm 9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01 9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm 7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01 10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm 5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01 11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm 3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00 12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm 1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00 13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm 8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00 14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm 5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00 15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm 3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00 16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm 1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01 17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm 1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01 18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm 3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01 converged reason 2 total number of relaxations 18 ======================================== The parameters for GPU usage are: mpirun -np $ntasks ./${executable} \ -A_mat_type aijcusparse \ -P_mat_type aijcusparse \ -vec_type cuda \ -use_gpu_aware_mpi 0 \ -em_ksp_monitor_true_residual \ -em_ksp_type bcgs \ -em_pc_type bjacobi \ -em_sub_pc_type ilu \ -em_sub_pc_factor_levels 3 \ -em_sub_pc_factor_fill 6 \ < ./Parameters.inp Code output: Solving for Hz fields bnorm 3.7727507818834821E-005 xnorm 2.3407405211699372E-016 Residual norms for em_ solve. 0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 converged reason 3 total number of relaxations 1 ======================================== Clearly the code running on GPU is not converging correctly. Has anyone experienced this problem? Sincerely, Ramoni Z. S. Azevedo -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Aug 30 16:47:33 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 30 Aug 2023 16:47:33 -0500 Subject: [petsc-users] Error using GPU in Fortran code In-Reply-To: References: Message-ID: Hi, Ramoni Do you have a reproducible example? Usually it is because the cpu and gpu are out of synchronization. It could be a user's problem or petsc's. Thanks. --Junchao Zhang On Wed, Aug 30, 2023 at 4:13?PM Ramoni Z. Sedano Azevedo < ramoni.zsedano at gmail.com> wrote: > Hello, > > I'm executing a code in Fortran using PETSc with MPI via CPU and I would > like to execute it using GPU. > PETSc is configured as follows: > ./configure \ > --prefix=${PWD}/installdir \ > --with-fortran \ > --with-fortran-kernels=true \ > --with-cuda \ > --download-fblaslapack \ > --with-scalar-type=complex \ > --with-precision=double \ > --with-debugging=0 \ > --with-x=0 \ > --with-gnu-compilers=1 \ > --with-cc=mpicc \ > --with-cxx=mpicxx \ > --with-fc=mpif90 \ > --with-make-exec=make > > The parameters for using MPI on CPU are: > mpirun -np $ntasks ./${executable} \ > -A_mat_type mpiaij \ > -P_mat_type mpiaij \ > -em_ksp_monitor_true_residual \ > -em_ksp_type bcgs \ > -em_pc_type bjacobi \ > -em_sub_pc_type ilu \ > -em_sub_pc_factor_levels 3 \ > -em_sub_pc_factor_fill 6 \ > < ./Parameters.inp > > Code output: > Solving for Hz fields > bnorm 3.7727507818834821E-005 > xnorm 2.3407405211699372E-016 > Residual norms for em_ solve. > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm > 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm > 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm > 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm > 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm > 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm > 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm > 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm > 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 > 8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm > 9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01 > 9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm > 7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01 > 10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm > 5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01 > 11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm > 3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00 > 12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm > 1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00 > 13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm > 8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00 > 14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm > 5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00 > 15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm > 3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00 > 16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm > 1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01 > 17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm > 1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01 > 18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm > 3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01 > converged reason 2 > total number of relaxations 18 > ======================================== > > The parameters for GPU usage are: > mpirun -np $ntasks ./${executable} \ > -A_mat_type aijcusparse \ > -P_mat_type aijcusparse \ > -vec_type cuda \ > -use_gpu_aware_mpi 0 \ > -em_ksp_monitor_true_residual \ > -em_ksp_type bcgs \ > -em_pc_type bjacobi \ > -em_sub_pc_type ilu \ > -em_sub_pc_factor_levels 3 \ > -em_sub_pc_factor_fill 6 \ > < ./Parameters.inp > > Code output: > Solving for Hz fields > bnorm 3.7727507818834821E-005 > xnorm 2.3407405211699372E-016 > Residual norms for em_ solve. > 0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm > 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm > 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 > converged reason 3 > total number of relaxations 1 > ======================================== > > Clearly the code running on GPU is not converging correctly. > Has anyone experienced this problem? > > Sincerely, > Ramoni Z. S. Azevedo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 30 20:45:40 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 30 Aug 2023 21:45:40 -0400 Subject: [petsc-users] Error using GPU in Fortran code In-Reply-To: References: Message-ID: <23DF1891-7980-42D8-A9BB-CAD848060B6E@petsc.dev> What convergence do you get without the GPU matrix and vector operations? Can you try the GPU run with -ksp_type gmres -ksp_pc_side right ? For certain problems, ILU can produce catastrophically bad preconditioners. Barry > On Aug 30, 2023, at 4:41 PM, Ramoni Z. Sedano Azevedo wrote: > > Hello, > > I'm executing a code in Fortran using PETSc with MPI via CPU and I would like to execute it using GPU. > PETSc is configured as follows: > ./configure \ > --prefix=${PWD}/installdir \ > --with-fortran \ > --with-fortran-kernels=true \ > --with-cuda \ > --download-fblaslapack \ > --with-scalar-type=complex \ > --with-precision=double \ > --with-debugging=0 \ > --with-x=0 \ > --with-gnu-compilers=1 \ > --with-cc=mpicc \ > --with-cxx=mpicxx \ > --with-fc=mpif90 \ > --with-make-exec=make > > The parameters for using MPI on CPU are: > mpirun -np $ntasks ./${executable} \ > -A_mat_type mpiaij \ > -P_mat_type mpiaij \ > -em_ksp_monitor_true_residual \ > -em_ksp_type bcgs \ > -em_pc_type bjacobi \ > -em_sub_pc_type ilu \ > -em_sub_pc_factor_levels 3 \ > -em_sub_pc_factor_fill 6 \ > < ./Parameters.inp > > Code output: > Solving for Hz fields > bnorm 3.7727507818834821E-005 > xnorm 2.3407405211699372E-016 > Residual norms for em_ solve. > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 > 8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm 9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01 > 9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm 7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01 > 10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm 5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01 > 11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm 3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00 > 12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm 1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00 > 13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm 8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00 > 14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm 5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00 > 15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm 3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00 > 16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm 1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01 > 17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm 1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01 > 18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm 3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01 > converged reason 2 > total number of relaxations 18 > ======================================== > > The parameters for GPU usage are: > mpirun -np $ntasks ./${executable} \ > -A_mat_type aijcusparse \ > -P_mat_type aijcusparse \ > -vec_type cuda \ > -use_gpu_aware_mpi 0 \ > -em_ksp_monitor_true_residual \ > -em_ksp_type bcgs \ > -em_pc_type bjacobi \ > -em_sub_pc_type ilu \ > -em_sub_pc_factor_levels 3 \ > -em_sub_pc_factor_fill 6 \ > < ./Parameters.inp > > Code output: > Solving for Hz fields > bnorm 3.7727507818834821E-005 > xnorm 2.3407405211699372E-016 > Residual norms for em_ solve. > 0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 > converged reason 3 > total number of relaxations 1 > ======================================== > > Clearly the code running on GPU is not converging correctly. > Has anyone experienced this problem? > > Sincerely, > Ramoni Z. S. Azevedo > From junchao.zhang at gmail.com Wed Aug 30 20:51:57 2023 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 30 Aug 2023 20:51:57 -0500 Subject: [petsc-users] Error using GPU in Fortran code In-Reply-To: <23DF1891-7980-42D8-A9BB-CAD848060B6E@petsc.dev> References: <23DF1891-7980-42D8-A9BB-CAD848060B6E@petsc.dev> Message-ID: On Wed, Aug 30, 2023 at 8:46?PM Barry Smith wrote: > > What convergence do you get without the GPU matrix and vector > operations? Barry, that was in the original email > > > Can you try the GPU run with -ksp_type gmres -ksp_pc_side right ? > > For certain problems, ILU can produce catastrophically bad > preconditioners. > Barry > > > > > On Aug 30, 2023, at 4:41 PM, Ramoni Z. Sedano Azevedo < > ramoni.zsedano at gmail.com> wrote: > > > > Hello, > > > > I'm executing a code in Fortran using PETSc with MPI via CPU and I would > like to execute it using GPU. > > PETSc is configured as follows: > > ./configure \ > > --prefix=${PWD}/installdir \ > > --with-fortran \ > > --with-fortran-kernels=true \ > > --with-cuda \ > > --download-fblaslapack \ > > --with-scalar-type=complex \ > > --with-precision=double \ > > --with-debugging=0 \ > > --with-x=0 \ > > --with-gnu-compilers=1 \ > > --with-cc=mpicc \ > > --with-cxx=mpicxx \ > > --with-fc=mpif90 \ > > --with-make-exec=make > > > > The parameters for using MPI on CPU are: > > mpirun -np $ntasks ./${executable} \ > > -A_mat_type mpiaij \ > > -P_mat_type mpiaij \ > > -em_ksp_monitor_true_residual \ > > -em_ksp_type bcgs \ > > -em_pc_type bjacobi \ > > -em_sub_pc_type ilu \ > > -em_sub_pc_factor_levels 3 \ > > -em_sub_pc_factor_fill 6 \ > > < ./Parameters.inp > > > > Code output: > > Solving for Hz fields > > bnorm 3.7727507818834821E-005 > > xnorm 2.3407405211699372E-016 > > Residual norms for em_ solve. > > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm > 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 > > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm > 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 > > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm > 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 > > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm > 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 > > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm > 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 > > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm > 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 > > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm > 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 > > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm > 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 > > 8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm > 9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01 > > 9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm > 7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01 > > 10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm > 5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01 > > 11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm > 3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00 > > 12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm > 1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00 > > 13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm > 8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00 > > 14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm > 5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00 > > 15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm > 3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00 > > 16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm > 1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01 > > 17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm > 1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01 > > 18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm > 3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01 > > converged reason 2 > > total number of relaxations 18 > > ======================================== > > > > The parameters for GPU usage are: > > mpirun -np $ntasks ./${executable} \ > > -A_mat_type aijcusparse \ > > -P_mat_type aijcusparse \ > > -vec_type cuda \ > > -use_gpu_aware_mpi 0 \ > > -em_ksp_monitor_true_residual \ > > -em_ksp_type bcgs \ > > -em_pc_type bjacobi \ > > -em_sub_pc_type ilu \ > > -em_sub_pc_factor_levels 3 \ > > -em_sub_pc_factor_fill 6 \ > > < ./Parameters.inp > > > > Code output: > > Solving for Hz fields > > bnorm 3.7727507818834821E-005 > > xnorm 2.3407405211699372E-016 > > Residual norms for em_ solve. > > 0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm > 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm > 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 > > converged reason 3 > > total number of relaxations 1 > > ======================================== > > > > Clearly the code running on GPU is not converging correctly. > > Has anyone experienced this problem? > > > > Sincerely, > > Ramoni Z. S. Azevedo > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 30 22:01:54 2023 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 30 Aug 2023 23:01:54 -0400 Subject: [petsc-users] Error using GPU in Fortran code In-Reply-To: References: <23DF1891-7980-42D8-A9BB-CAD848060B6E@petsc.dev> Message-ID: <9BE1DD32-9A78-462A-A6B3-7DD45E337DF4@petsc.dev> Yikes, sorry I missed that the first run was CPU and the second GPU. The run on the CPU is indicative of a very bad preconditioner. It doesn't really converge. When the true residual norm jumps by a factor of 10^3 at the first iteration, this means the ILU preconditioner is just not appropriate or reasonable. The "convergence" of the preconditioned residual norm is meaningless. >> > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 >> > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 >> > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 >> > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 >> > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 >> > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 >> > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 >> > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 I won't worry about the GPU behavior (it is just due to slightly different numerical computations on the GPU and not surprising.) You need to use a different preconditioner, even on the CPU. > On Aug 30, 2023, at 9:51 PM, Junchao Zhang wrote: > > > > > On Wed, Aug 30, 2023 at 8:46?PM Barry Smith > wrote: >> >> What convergence do you get without the GPU matrix and vector operations? > Barry, that was in the original email >> >> >> Can you try the GPU run with -ksp_type gmres -ksp_pc_side right ? >> >> For certain problems, ILU can produce catastrophically bad preconditioners. >> Barry >> >> >> >> > On Aug 30, 2023, at 4:41 PM, Ramoni Z. Sedano Azevedo > wrote: >> > >> > Hello, >> > >> > I'm executing a code in Fortran using PETSc with MPI via CPU and I would like to execute it using GPU. >> > PETSc is configured as follows: >> > ./configure \ >> > --prefix=${PWD}/installdir \ >> > --with-fortran \ >> > --with-fortran-kernels=true \ >> > --with-cuda \ >> > --download-fblaslapack \ >> > --with-scalar-type=complex \ >> > --with-precision=double \ >> > --with-debugging=0 \ >> > --with-x=0 \ >> > --with-gnu-compilers=1 \ >> > --with-cc=mpicc \ >> > --with-cxx=mpicxx \ >> > --with-fc=mpif90 \ >> > --with-make-exec=make >> > >> > The parameters for using MPI on CPU are: >> > mpirun -np $ntasks ./${executable} \ >> > -A_mat_type mpiaij \ >> > -P_mat_type mpiaij \ >> > -em_ksp_monitor_true_residual \ >> > -em_ksp_type bcgs \ >> > -em_pc_type bjacobi \ >> > -em_sub_pc_type ilu \ >> > -em_sub_pc_factor_levels 3 \ >> > -em_sub_pc_factor_fill 6 \ >> > < ./Parameters.inp >> > >> > Code output: >> > Solving for Hz fields >> > bnorm 3.7727507818834821E-005 >> > xnorm 2.3407405211699372E-016 >> > Residual norms for em_ solve. >> > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 >> > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 >> > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 >> > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 >> > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 >> > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 >> > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 >> > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 >> > 8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm 9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01 >> > 9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm 7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01 >> > 10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm 5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01 >> > 11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm 3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00 >> > 12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm 1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00 >> > 13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm 8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00 >> > 14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm 5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00 >> > 15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm 3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00 >> > 16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm 1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01 >> > 17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm 1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01 >> > 18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm 3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01 >> > converged reason 2 >> > total number of relaxations 18 >> > ======================================== >> > >> > The parameters for GPU usage are: >> > mpirun -np $ntasks ./${executable} \ >> > -A_mat_type aijcusparse \ >> > -P_mat_type aijcusparse \ >> > -vec_type cuda \ >> > -use_gpu_aware_mpi 0 \ >> > -em_ksp_monitor_true_residual \ >> > -em_ksp_type bcgs \ >> > -em_pc_type bjacobi \ >> > -em_sub_pc_type ilu \ >> > -em_sub_pc_factor_levels 3 \ >> > -em_sub_pc_factor_fill 6 \ >> > < ./Parameters.inp >> > >> > Code output: >> > Solving for Hz fields >> > bnorm 3.7727507818834821E-005 >> > xnorm 2.3407405211699372E-016 >> > Residual norms for em_ solve. >> > 0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 >> > 1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 >> > converged reason 3 >> > total number of relaxations 1 >> > ======================================== >> > >> > Clearly the code running on GPU is not converging correctly. >> > Has anyone experienced this problem? >> > >> > Sincerely, >> > Ramoni Z. S. Azevedo >> > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramoni.zsedano at gmail.com Thu Aug 31 12:21:03 2023 From: ramoni.zsedano at gmail.com (Ramoni Z. Sedano Azevedo) Date: Thu, 31 Aug 2023 14:21:03 -0300 Subject: [petsc-users] Error using GPU in Fortran code In-Reply-To: <9BE1DD32-9A78-462A-A6B3-7DD45E337DF4@petsc.dev> References: <23DF1891-7980-42D8-A9BB-CAD848060B6E@petsc.dev> <9BE1DD32-9A78-462A-A6B3-7DD45E337DF4@petsc.dev> Message-ID: Thank you all for the answers. I've just started in a group where the code has been running for some time on the CPU and we started trying to run it on the GPU to see a processing gain. I'm going to talk here about the points you've already raised. Thank you very much! Em qui., 31 de ago. de 2023 ?s 00:02, Barry Smith escreveu: > > Yikes, sorry I missed that the first run was CPU and the second GPU. > > The run on the CPU is indicative of a very bad preconditioner. It > doesn't really converge. When the true residual norm jumps by a factor of > 10^3 at the first iteration, this means the ILU preconditioner is just not > appropriate or reasonable. The "convergence" of the preconditioned residual > norm is meaningless. > > > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm >> 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 >> > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm >> 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 >> > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm >> 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 >> > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm >> 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 >> > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm >> 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 >> > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm >> 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 >> > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm >> 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 >> > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm >> 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 > > > I won't worry about the GPU behavior (it is just due to slightly > different numerical computations on the GPU and not surprising.) > > You need to use a different preconditioner, even on the CPU. > > > On Aug 30, 2023, at 9:51 PM, Junchao Zhang > wrote: > > > > > On Wed, Aug 30, 2023 at 8:46?PM Barry Smith wrote: > >> >> What convergence do you get without the GPU matrix and vector >> operations? > > Barry, that was in the original email > >> >> >> Can you try the GPU run with -ksp_type gmres -ksp_pc_side right ? >> >> For certain problems, ILU can produce catastrophically bad >> preconditioners. >> Barry >> >> >> >> > On Aug 30, 2023, at 4:41 PM, Ramoni Z. Sedano Azevedo < >> ramoni.zsedano at gmail.com> wrote: >> > >> > Hello, >> > >> > I'm executing a code in Fortran using PETSc with MPI via CPU and I >> would like to execute it using GPU. >> > PETSc is configured as follows: >> > ./configure \ >> > --prefix=${PWD}/installdir \ >> > --with-fortran \ >> > --with-fortran-kernels=true \ >> > --with-cuda \ >> > --download-fblaslapack \ >> > --with-scalar-type=complex \ >> > --with-precision=double \ >> > --with-debugging=0 \ >> > --with-x=0 \ >> > --with-gnu-compilers=1 \ >> > --with-cc=mpicc \ >> > --with-cxx=mpicxx \ >> > --with-fc=mpif90 \ >> > --with-make-exec=make >> > >> > The parameters for using MPI on CPU are: >> > mpirun -np $ntasks ./${executable} \ >> > -A_mat_type mpiaij \ >> > -P_mat_type mpiaij \ >> > -em_ksp_monitor_true_residual \ >> > -em_ksp_type bcgs \ >> > -em_pc_type bjacobi \ >> > -em_sub_pc_type ilu \ >> > -em_sub_pc_factor_levels 3 \ >> > -em_sub_pc_factor_fill 6 \ >> > < ./Parameters.inp >> > >> > Code output: >> > Solving for Hz fields >> > bnorm 3.7727507818834821E-005 >> > xnorm 2.3407405211699372E-016 >> > Residual norms for em_ solve. >> > 0 KSP preconditioned resid norm 1.236208833927e-08 true resid norm >> 1.413045088306e-03 ||r(i)||/||b|| 3.745397377137e+01 >> > 1 KSP preconditioned resid norm 1.664973208594e-10 true resid norm >> 3.463939828700e+00 ||r(i)||/||b|| 9.181470043910e+04 >> > 2 KSP preconditioned resid norm 8.366983092820e-14 true resid norm >> 9.171051852915e-02 ||r(i)||/||b|| 2.430866066466e+03 >> > 3 KSP preconditioned resid norm 1.386354386207e-14 true resid norm >> 1.905770367881e-02 ||r(i)||/||b|| 5.051408052270e+02 >> > 4 KSP preconditioned resid norm 4.635883581096e-15 true resid norm >> 7.285180695640e-03 ||r(i)||/||b|| 1.930999717931e+02 >> > 5 KSP preconditioned resid norm 1.974093227402e-15 true resid norm >> 2.953370060898e-03 ||r(i)||/||b|| 7.828161020018e+01 >> > 6 KSP preconditioned resid norm 1.182781787023e-15 true resid norm >> 2.288756945462e-03 ||r(i)||/||b|| 6.066546871987e+01 >> > 7 KSP preconditioned resid norm 6.221244366707e-16 true resid norm >> 1.263339414861e-03 ||r(i)||/||b|| 3.348589631014e+01 >> > 8 KSP preconditioned resid norm 3.800488678870e-16 true resid norm >> 9.015738978063e-04 ||r(i)||/||b|| 2.389699054959e+01 >> > 9 KSP preconditioned resid norm 2.498733213989e-16 true resid norm >> 7.194509577987e-04 ||r(i)||/||b|| 1.906966559396e+01 >> > 10 KSP preconditioned resid norm 1.563017112250e-16 true resid norm >> 5.055208317846e-04 ||r(i)||/||b|| 1.339926385310e+01 >> > 11 KSP preconditioned resid norm 8.733803057628e-17 true resid norm >> 3.171941303660e-04 ||r(i)||/||b|| 8.407502872682e+00 >> > 12 KSP preconditioned resid norm 4.907010803529e-17 true resid norm >> 1.868311755294e-04 ||r(i)||/||b|| 4.952120782177e+00 >> > 13 KSP preconditioned resid norm 2.214070343700e-17 true resid norm >> 8.760421740830e-05 ||r(i)||/||b|| 2.322025028236e+00 >> > 14 KSP preconditioned resid norm 1.333171674446e-17 true resid norm >> 5.984548368534e-05 ||r(i)||/||b|| 1.586255948119e+00 >> > 15 KSP preconditioned resid norm 7.696778066646e-18 true resid norm >> 3.786809196913e-05 ||r(i)||/||b|| 1.003726303656e+00 >> > 16 KSP preconditioned resid norm 3.863008301366e-18 true resid norm >> 1.284864871601e-05 ||r(i)||/||b|| 3.405644702988e-01 >> > 17 KSP preconditioned resid norm 2.061402843494e-18 true resid norm >> 1.054741071688e-05 ||r(i)||/||b|| 2.795681805311e-01 >> > 18 KSP preconditioned resid norm 1.062033155108e-18 true resid norm >> 3.992776343462e-06 ||r(i)||/||b|| 1.058319664960e-01 >> > converged reason 2 >> > total number of relaxations 18 >> > ======================================== >> > >> > The parameters for GPU usage are: >> > mpirun -np $ntasks ./${executable} \ >> > -A_mat_type aijcusparse \ >> > -P_mat_type aijcusparse \ >> > -vec_type cuda \ >> > -use_gpu_aware_mpi 0 \ >> > -em_ksp_monitor_true_residual \ >> > -em_ksp_type bcgs \ >> > -em_pc_type bjacobi \ >> > -em_sub_pc_type ilu \ >> > -em_sub_pc_factor_levels 3 \ >> > -em_sub_pc_factor_fill 6 \ >> > < ./Parameters.inp >> > >> > Code output: >> > Solving for Hz fields >> > bnorm 3.7727507818834821E-005 >> > xnorm 2.3407405211699372E-016 >> > Residual norms for em_ solve. >> > 0 KSP preconditioned resid norm 1.236220954395e-08 true resid norm >> 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 >> > 1 KSP preconditioned resid norm 0.000000000000e+00 true resid norm >> 3.772750781883e-05 ||r(i)||/||b|| 1.000000000000e+00 >> > converged reason 3 >> > total number of relaxations 1 >> > ======================================== >> > >> > Clearly the code running on GPU is not converging correctly. >> > Has anyone experienced this problem? >> > >> > Sincerely, >> > Ramoni Z. S. Azevedo >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Aug 31 17:40:10 2023 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 31 Aug 2023 18:40:10 -0400 Subject: [petsc-users] performance regression with GAMG In-Reply-To: References: <9716433a-7aa0-9284-141f-a1e2fccb310e@imperial.ac.uk> Message-ID: Hi Stephan, This branch is settling down. adams/gamg-add-old-coarsening I made the old, not minimum degree, ordering the default but kept the new "aggressive" coarsening as the default, so I am hoping that just adding "-pc_gamg_use_aggressive_square_graph true" to your regression tests will get you back to where you were before. Fingers crossed ... let me know if you have any success or not. Thanks, Mark On Tue, Aug 15, 2023 at 1:45?PM Mark Adams wrote: > Hi Stephan, > > I have a branch that you can try: adams/gamg-add-old-coarsening > > > Things to test: > * First, verify that nothing unintended changed by reproducing your bad > results with this branch (the defaults are the same) > * Try not using the minimum degree ordering that I suggested > with: -pc_gamg_use_minimum_degree_ordering false > -- I am eager to see if that is the main problem. > * Go back to what I think is the old method: > -pc_gamg_use_minimum_degree_ordering > false -pc_gamg_use_aggressive_square_graph true > > When we get back to where you were, I would like to try to get modern > stuff working. > I did add a -pc_gamg_aggressive_mis_k <2> > You could to another step of MIS coarsening with -pc_gamg_aggressive_mis_k > 3 > > Anyway, lots to look at but, alas, AMG does have a lot of parameters. > > Thanks, > Mark > > On Mon, Aug 14, 2023 at 4:26?PM Mark Adams wrote: > >> >> >> On Mon, Aug 14, 2023 at 11:03?AM Stephan Kramer >> wrote: >> >>> Many thanks for looking into this, Mark >>> > My 3D tests were not that different and I see you lowered the >>> threshold. >>> > Note, you can set the threshold to zero, but your test is running so >>> much >>> > differently than mine there is something else going on. >>> > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot >>> for >>> > in 3D. >>> > >>> > So it is not clear what the problem is. Some questions: >>> > >>> > * do you have a picture of this mesh to show me? >>> >>> It's just a standard hexahedral cubed sphere mesh with the refinement >>> level giving the number of times each of the six sides have been >>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 >>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 = >>> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) going >>> to the next Level >>> >> >> I see, and I assume these are pretty stretched elements. >> >> >>> >>> > * what do you mean by Q1-Q2 elements? >>> >>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity >>> and (tri)linear for pressure >>> >>> I guess you could argue we could/should just do good old geometric >>> multigrid instead. More generally we do use this solver configuration a >>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our >>> adaptive mesh runs - would it be worth to see if we have the same >>> performance issues with tetrahedral P2-P1? >>> >> >> No, you have a clear reproducer, if not minimal. >> The first coarsening is very different. >> >> I am working on this and I see that I added a heuristic for thin bodies >> where you order the vertices in greedy algorithms with minimum degree first. >> This will tend to pick corners first, edges then faces, etc. >> That may be the problem. I would like to understand it better (see below). >> >> >> >>> > >>> > It would be nice to see if the new and old codes are similar without >>> > aggressive coarsening. >>> > This was the intended change of the major change in this time frame as >>> you >>> > noticed. >>> > If these jobs are easy to run, could you check that the old and new >>> > versions are similar with "-pc_gamg_square_graph 0 ", ( and you only >>> need >>> > one time step). >>> > All you need to do is check that the first coarse grid has about the >>> same >>> > number of equations (large). >>> Unfortunately we're seeing some memory errors when we use this option, >>> and I'm not entirely clear whether we're just running out of memory and >>> need to put it on a special queue. >>> >>> The run with square_graph 0 using new PETSc managed to get through one >>> solve at level 5, and is giving the following mg levels: >>> >>> rows=174, cols=174, bs=6 >>> total: nonzeros=30276, allocated nonzeros=30276 >>> -- >>> rows=2106, cols=2106, bs=6 >>> total: nonzeros=4238532, allocated nonzeros=4238532 >>> -- >>> rows=21828, cols=21828, bs=6 >>> total: nonzeros=62588232, allocated nonzeros=62588232 >>> -- >>> rows=589824, cols=589824, bs=6 >>> total: nonzeros=1082528928, allocated nonzeros=1082528928 >>> -- >>> rows=2433222, cols=2433222, bs=3 >>> total: nonzeros=456526098, allocated nonzeros=456526098 >>> >>> comparing with square_graph 100 with new PETSc >>> >>> rows=96, cols=96, bs=6 >>> total: nonzeros=9216, allocated nonzeros=9216 >>> -- >>> rows=1440, cols=1440, bs=6 >>> total: nonzeros=647856, allocated nonzeros=647856 >>> -- >>> rows=97242, cols=97242, bs=6 >>> total: nonzeros=65656836, allocated nonzeros=65656836 >>> -- >>> rows=2433222, cols=2433222, bs=3 >>> total: nonzeros=456526098, allocated nonzeros=456526098 >>> >>> and old PETSc with square_graph 100 >>> >>> rows=90, cols=90, bs=6 >>> total: nonzeros=8100, allocated nonzeros=8100 >>> -- >>> rows=1872, cols=1872, bs=6 >>> total: nonzeros=1234080, allocated nonzeros=1234080 >>> -- >>> rows=47652, cols=47652, bs=6 >>> total: nonzeros=23343264, allocated nonzeros=23343264 >>> -- >>> rows=2433222, cols=2433222, bs=3 >>> total: nonzeros=456526098, allocated nonzeros=456526098 >>> -- >>> >>> Unfortunately old PETSc with square_graph 0 did not complete a single >>> solve before giving the memory error >>> >> >> OK, thanks for trying. >> >> I am working on this and I will give you a branch to test, but if you can >> rebuild PETSc here is a quick test that might fix your problem. >> In src/ksp/pc/impls/gamg/agg.c you will see: >> >> PetscCall(PetscSortIntWithArray(nloc, degree, permute)); >> >> If you can comment this out in the new code and compare with the old, >> that might fix the problem. >> >> Thanks, >> Mark >> >> >>> >>> > >>> > BTW, I am starting to think I should add the old method back as an >>> option. >>> > I did not think this change would cause large differences. >>> >>> Yes, I think that would be much appreciated. Let us know if we can do >>> any testing >>> >>> Best wishes >>> Stephan >>> >>> >>> > >>> > Thanks, >>> > Mark >>> > >>> > >>> > >>> > >>> >> Note that we are providing the rigid body near nullspace, >>> >> hence the bs=3 to bs=6. >>> >> We have tried different values for the gamg_threshold but it doesn't >>> >> really seem to significantly alter the coarsening amount in that first >>> >> step. >>> >> >>> >> Do you have any suggestions for further things we should try/look at? >>> >> Any feedback would be much appreciated >>> >> >>> >> Best wishes >>> >> Stephan Kramer >>> >> >>> >> Full logs including log_view timings available from >>> >> https://github.com/stephankramer/petsc-scaling/ >>> >> >>> >> In particular: >>> >> >>> >> >>> >> >>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >>> >> >>> >> >>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >>> >> >>> >> >>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >>> >> >>> >> >>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >>> >> >>> >> >>> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >>> >> >>> >> >>> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >>> >> >>> >> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: